.. _advanced-c++: Advanced C++ ############ By now, you have a decent handle on the basics of C++. You know enough to go off and write some pretty nice programs, but you are from from done learning all about C++. This is one big, huge, honking language, with what some think are far too many complex features. But, it is one of the most popular and important languages in the world, today! There are many other languages to pick from, and each has its place in programming. Just as an example, here is a list of languages I have used to write a serious program in the last five years or so: * C/C++ * Python * SQL * Intel/AMD Assembly Language * Atmel AVR microcontroller assembly code * ARM microcontroller assembly code * Haskel * ML * Lisp * Ruby * PC Batch * Bash Shell * Visual Basic (ugh!) * Java (double ugh!) * JavaScript * PHP * HTML/XHTML * XML * SVG There are surely more that I cannot remember now. No wonder I have so many books. I do not claim to be proficient in all of those languages at any moment, I need a bit of time to come back up to speed with those I do not use on a daily basis. But, I can sit down and read a program written in any of these languages and figure out what is going on well enough to understand the program logic, and translate the ideas into the language I need to use for my current project. When you can do this, you are ready to call yourself a serious programmer! (I probably am such a beast - I have been doing this since 1965 - yikes!) So, what else does C++ have to offer us? Here are a few topics from an advanced C++ book I have on my bookshelf: * Advanced basic features of methods and classes * More on inheritance and overloading * Converting objects to some other type (like `floats` to `ints`) * Error handling and exceptions * Specialized containers * Strings and Regular Expressions * Advanced Input/Output * Using code templates * Memory Management * Multi-threading * Code efficiency * Debugging and testing * Using standard libraries and design patterns * Learning how software engineering is really done That is a lot of topics, some of which are fascinating all by themselves. Like anything worth learning, you need to explore your tools and learn them well to become a "professional". Hopefully, you will become one, and hopefully, you will become able to work in more than one language. Let's dip into a few of these topics as a way to close out this course. Since some of these topics are part of the final projects you are working on, I will pick only a few from this list that are not part of that project Multi-threading *************** This is one of the most important, and least taught concepts (in my opinion) of everything on this list, so let's start here. We have reached a point in the evolution of computer design where we just cannot make things any smaller, or faster. So, we have turned to another tactic to get more speed. If one processor is good, two are better. If two are better, why not four, of ten, or a thousand? How in the world can we write programs that work on machines with this kind of design. The answer is to write code in the form of a "thread", which is nothing more than a single flow of instructions of the sort we have been writing all along. The trick is to write your program so that multiple threads can be running at the same time, and we can use the multiple processors in your computer to get more work done in a shorter period of time. Every once in a while, these independent threads might need to synchronize themselves so they can pass data back and forth. This is where things get interesting, and complicated! The big problem we face in education these days is to get you to see problems in a way where this idea makes sense. Let's take a simple problem, and show how this can work. Adding up a bunch of numbers ============================ Wait, this is too simple. Or is it. Yes we can do this the old, standard way: .. code-block:: c int data[10000] = ???; int sum = 0; for(int i=0;i<=10000;i++) sum += data[i]; But what could we do if we had more than one processor available. Say we have four, then we can do this: .. code-block:: c // thread 1 int sum1 = 0; for(int = 0; i< 250; i++) sum += data[i]; // thread 2 int sum2 = 0; for(int i=250;i<500;i++) sum2 += data[i]; // thread 3 int sum3 = 0; for(int i=500;i<750;i++) sum3 += data[i]; // thread 4 int sum4 = 0; for(int i=750;i<1000;i++) sum4 += data[i]; int sum = sum1 + sum2 + sum3 + sum4; What you have to remember is that all four of those threads will be working at the same time. The only issue we have is this: is it possible for each thread to access the same data array simultaneously. The answer is yes, if you do things right. (How is a topic for another advanced course!) Here is a simple example at using threads to do several things "at once". In this case we are trying to print out "hello" a bunch of times, but by doing the processing in parallel. Here is the code: .. literalinclude:: code2/threads.cpp :linenos: :caption: threads.cpp This code was written back when machines were not that fast, and it ran fine. On my hot-rod MacBook Pro, the output looked a bit strange: .. command-output:: g++ -o test threads.cpp :cwd: code2 .. command-output:: ./test :cwd: code2 Here we see a classic problem. The program started out in a loop that spun off five independent threads, each of which began running as soon as they were launched by the operating system. We got several of them running before thread 0 began generating output, and still the main loop was trying to spin off more threads. The output from that point on is a garbled mess of text from `main` and each of the threads that has not yet finished. Phew! To control this mess, we need ways to synchronize access to resources (like the output stream) so one thread can get something done before getting zapped by another thread. This is done with special flags called "mutex"s (mutual exclusion flags), which basically give one thread at a time permission to access something. In this example that something might have been the output stream, but it might have been access to a variable that needs to be shared between the different threads. I think you can see that this kind of programming can get hard to think through, especially in weird but possible cases. Suppose one thread is waiting for access to a variable owned by another thread, and that second thread is waiting for access to a variable owned by the first thread. This situation is called a "deadlock", since no one is going anywhere (like on I-35, these days!) Learning how to do all of this is a challenge, but it will be the way we do business in the future. You ought to start looking at this kind of programming as soon as you can. There are even new languages coming along (Google's GO language is one) that can help you write programs of this sort! Writing Efficient Code ********************** We do not seem to be that interested in this much as we learn about programming, but it gets important in a hurry. You might generate code that is too big, or too slow, and need to find a way to improve your situation. Writing efficient code requires really understanding your machine .. note:: I am biased a bit here, since I teach courses in assembly language programming and computer architecture (COSC2325). Profiling ========= Efficiency and performance are related terms. It is expensive, in terms of processing time, to call a method, so writing tiny methods to do small bits of work is actually a bad idea in the long run. It might make it easier to write the code that way in the beginning, but the performance of your code may suffer as a result. There are tools called "profiling" tools that will help you identify where your program is spending most of its time. Once you identify these hot-spots, you can focus on rewriting that code to make things run faster. In some cases, writing that code in another, faster, language may be the answer. This is where some folks drop all the way down to assembly language, the lowest level language in any machine, but the most powerful level, to get the code to perform better. I highly recommend that you add learning something about the inside of your machine, and its native language as part of your future education n programming. I really enjoy programming at that low level, ever since I started doing that on a Cray-1 supercomputer back in the mid 1970s! Caching objects =============== Efficiency can come in other schemes you use as well. Suppose you set up a program to allocate a block of memory objects like in a linked list. You need to create those objects as you build the list, but it takes time to do so. Once you are done with a particular object, you normally delete it and return that block of memory to the operating system. That also takes time. A better scheme might be to take the object and link it into a list of currently unused objects. When you need another object of that type, just unlink one from the unused list and reset the attributes and use it. That saves the time it takes to create a new one. Of course, this might lead to problems if the number of unused objects gets large, so maybe you keep a pool of objects around, then delete the extra objects when that makes sense. Debugging ********* Basically, a debugger gives you the power to run your code one line (or statement) at a time, and inspect all of the things living in memory at any moment. You can set up "breakpoints" where your program will stop and return control to the debugger so you can look around. The process is fairly simple to explain, but the tools can be a bit overwhelming. Many programming classes do not use a debugger. I do not in many of my classes because the level of the class is often too low to spend time learning a complex tool for trying to figure out what is going on in your program. That may well be a mistake, but the lack of easy to learn debugging tools gets in the way. The modern "test-driven development" process actually reduces the need for using debuggers, since you are testing your code as you go, and hopefully do not have many bugs to track down! The "godfather" of all debugging tools is part of Microsoft's Visual Studio product, `WinDBG`. This tool can diagnose problems in programs running on the other side of the planet over the Internet! In the Linux world, the standard debugger is `GDB` (GNU Debugger) which began life as a command-line tool. In this one, you do everything by issuing simple text commands. It works, but is not ideal for learning. Still it lives inside IDE tools like Dev-C++ (but has been flaky enough that not many folks use it). I do use this tool in my assembly language class, since a good debugger can show you what your code is doing at the level of the particular machine your code is running on. That means in assembly language where you can run your code one instruction at a time. .. note:: Should you trust your compiler all the time? Actually, no! Any program of reasonable complexity has bugs, including the compiler we all take for granted as being right. I found an bug in an IBM mainframe compiler when I was in graduate school that was serious enough that when I reported it to IBM, they send out teams of engineers to replace the compilers in all of their mainframes, including six that were being used in the Apollo program by NASA! WOW! Here is a simple example of running GDB ======================================= Start with a simple program .. code-block:: c int main(int argc, char * argv[]) { int a = 5; int b = a + 6; return 0; } We can build this program using a simple command line: .. code-block:: text g++ -g test.c -o test The ``-g`` option is supposed to tell the compiler to add information to our object file that will be useful when debugging. Run your code under GDB control ------------------------------- Now, we will fire up GDB and give it our executable to work with. The debugger will load the program in its private memory and set things up so the debugger can run the program and stop it on demand. Start off by doing this: .. code-block:: text gdb test ... (gdb) We will set up a ``breakpoint``, which is a command to stop the program when it reaches a specific point. In this test, we want to stop the program when it reaches the ``main`` routine. In spite of what you may have been thinking about what happens when your program runs, ``main`` is not the first thing that happens. Instead, there is a lot of initialization code that must be run to set up the run-time system for the program. Here is how we set the breakpoint: .. code-block:: text (gdb) break main Breakpoint 1 at 0x4004b8: file test.cpp, line 2. (gdb) The debugger is telling you exactly where the breakpoint was set. In this case it was at address 0x4004b8, which is an address in the debuggers data space (it is running in low memory). Now, let's run the program and see if it stops at the breakpoint: .. code-block:: text (gdb) run Starting program: /home/rblack/COSC1337/test Breakpoint 1, main () at test.cpp:2 2 int a = 5; (gdb) The program has stopped at this point and we see the next line to be processed (it has not run through that one, yet). Now, try this: .. code-block:: text (gdb) next 2 4 return 0; (gdb) In this example, we told the debugger to step to the next line twice (the 2 controlled that). The program is at the last line. I admit, this is not an ideal debugger, but it is used widely in the Linux world, and is actually pretty powerful. The Dev-C++ IDE runs this debugger using a GUI interface, and there are similar GUI interfaces available on other platforms as well. You can even control it with a script written in something like Python! Adding Print Statements ======================= In desperation, many programmers turn to output statements to see what is going on in places in their code. The problem with that is that many problems are sensitive to exactly how your code is laid out, and adding an output statement moves things around. A classic example of this is over-indexing an array and stepping on some piece of data that is not in your array. By adding output statements, the data could end up moving to some other memory location, and the error might disappear for a time, only to reappear later in some other chunk of code. That is why thinking more, and coding less, can be so important in engineering a good program. Text Processing *************** A surprising amount of programming involves processing text. We think of computers as "number crunchers" but they also digest text data. The compiler is a simple example of such a program. It has to read your source code and break it down into chunks, then figure out what all of those chunks are asking the machine to do. The first part, we call lexical analysis. That involves finding reserved words, variable names, punctuation and multi-symbol operators, and those dreaded semicolons! Much of this processing involves something called "regular expressions". Regular Expressions =================== There is an entire programming language of a sort dedicated to describing text patterns that we need to identify in a bunch of text. Here is such a pattern, looking for a simple phone number: .. code-block:: text ^\(\d{3}\) ?\d{3}( -)?\d{4} OMG! What a mess! Actually, it is not that hard to decode once you know a few things: * ^ marks the beginning of a line * \( - means exactly the open parenthesis symbol * \d{3} - means exactly three digits * \) - means exactly the close parenthesis symbol * The space followed by the ? means the space is optional * \d{3} - as before * ( -)? means we have an optional space, or dash * \d{4} means we have exactly four digits This pattern will match any of these example numbers * (512)555-1234 * (123) 555 1234 Most languages support setting up regular expressions so the lexical analysis of your program is easy to do. Here is a pattern for a legal C++ variable name: .. code-block:: text "[a-zA-Z_][a-zA-Z0-9]*" This one accepts any character in the brackets. "a-z" means any lower case letter. The star at the end allows zero or more of the characters in the brackets to the left of the star. So, any of these are legal: * a * _a * a9 * _aBcDeF01z (boy is that a terrible variable name, but a legal one!) But not this: * 9aB (digits cannot be the first character!) Processing regular expressions is possible in C++. but ugly. It is much easier to do in Perl or Python! Here is a C++ example: .. literalinclude:: code2/regex.cpp :linenos: And here is what we get: .. code-block:: text $ g++ -std=c++11 regex.cpp -o regex $ ./regex Object attribute reference found This pattern allows legal object attribute references in a string. On to your Next Course ********************** I hope you have had fun learning a bit of C++. It is a good tool to start exploring serious programming with, but by no means the only one around. My suggestion is to stick with it and learn enough about the advanced concepts to feel like you really can tackle serious programming projects. Then, take those concepts and explore another language. Get seriously into Python, or explore GO, or "D", which are new languages supporting parallel processing. Learn how to take advantage of the graphics card in your system to do number crunching. There are many cool things to learn, and you have a good start now! Good luck on your next step!