This will contain posts I deem to be informational regarding various aspects of programming languages.

Updated Feb 26, 2014 because I’m dumb.

Cross-Compiling shouldn’t be so Hard

I’m creating this post in the off-chance that someone else with a Raspberry Pi who can’t figure out how to cross-compile from Windows will find this useful (or find it at all).

I have been working on and off on a project that will run on Windows, x86 Linux, and ARM Linux.  Problem is, my Linux machines are a little slow.  My x86 Linux is running on an Everex Cloudbook sporting a 1500MHz VIA C7-M processor with 512MB of memory. My ARM Linux is running on a Rapsberry Pi with an ARM11 CPU clocked at 700MHz with 512MB of RAM.  Compiling on my Windows PC is much faster (even if MinGW make won’t run multiple threads at once).  Cross-compiling for x86 Linux isn’t really fruitful, as it’s not my main target.  However, cross-compiling for my Raspberry Pi would be tremendously beneficial (a 30-second compile on my PC takes 5-10 minutes on the RPi).

Find a Working Cross Compiler

Now, most websites online talk about cross-compiling from x86 Linux to ARM Linux.  The RPi creators even have their own cross-compiler available.  There are, however, 2 problems with this.  First, I don’t want to install Linux on my PC, virtual or otherwise.  Second, their GCC cross-compiler is only at version 4.6, and I need C++11 features only available in GCC 4.7+.  My solution was to start with a pre-built GCC 4.7 ARM-Linux cross-compiler and modify things until I got something working on my RPi.  Below are the steps I had to take to finally get it working.

1) Find a x86 Windows to ARM Linux cross-compiler.  This was fairly simple.  A quick google search led me to http://linaro.org.  They have GCC 4.7 and GCC 4.8 ARM Linux cross-compiler binaries for Windows.

2) Find the compiler options I would need to tell Linaro GCC to compile specifically for the Raspberry Pi.  This wasn’t too difficult.  Linaro is pre-built to target ARMv7 instructions and a Cortex-9 CPU, but the following compiler flags take care that issue.  These commands (some of which may not actually be necessary) tell the compiler to target the RPi hardware.

3) Make the compiler link against the RPi static libraries.  This was the hardest part.  Using steps 1 and 2 I was able to get a file to compile for the RPi, but the problem is that Linaro is built against version 2.6.32 of the Linux Kernel, while the Raspbian image I am using is based on 2.6.26.  This meant the generated executable was incompatible.  The solution was to copy the static libraries from my RPi to my Windows PC and make the linker use them.  This was a two-step process.  Note that it took much trial and error, so some of this might be unneeded, but this is the process I got working on my machine.

3a) Copy the libraries:  This involved tar-ing the following directories and using SCP to get them off the RPi (I tar-ed instead of “zipping” because data transfer was faster than trying to get the RPi to compress the data).  Also note that you must tell tar to follow both symlinks and hardlinks, as both methods are in use.  This doubles/triples the size, but you cannot preserve Linux links on a Windows file system.

  • /usr/lib/arm-linux-gneuabihf
  • /usr/lib/gcc/arm-linux-gneaubihf/4.7/*
  • /opt/vc/lib
  • /lib/arm-linux-gneaubihf

3b) Remove all libraries that came with Linaro and replace them with the RPi libraries.  This is necessary because Linaro was built against a different version of the Linux Kernel than my version of Raspbian. This was not apparent at first, and the seg-faults I was getting on RPi were of no help.  What was a help was doing “ file <executable> ” from the RPi, which showed that my executable was linked against 2.6.32, while the RPi is only 2.6.26.  I then had to delete the following files from my Linaro Installation:

  • <install>\lib\gcc\arm-linux-gnueabihf\4.7.3\*.o|*.a
  • <install>\lib\gcc\arm-linux-gnueabihf\4.7.3\arm-linux-gnueabihf
  • <install>\arm-linux-gnueabihf\lib
  • <install>\arm-linux-gnueabihf\libc\lib
  • <install>\arm-linux-gnueabihf\libc\usr\lib

(don’t get me started on how idiotic this layout is. There is actually a “arm-linux-gneaubihf\libc\usr\lib\arm-linux-gneaubihf\usr\lib” directory!?!)

UPDATE:  I did NOT need to copy the files in this step (see step 3c) I then placed the RPi libraries as follows:

  • /lib/arm-linux-gneaubihf became <install>\arm-linux-gneaubihf\libc\lib\arm-linux-gneaubihf
  • /usr/lib/gcc/arm-linux-gneaubihf/4.7/* were copied to the new <install>\arm-linux-gneaubihf\libc\lib directory
  • /usr/lib/arm-linux-gneaubihf became <install>\arm-linux-gneaubihf\libc\usr\lib\arm-linux-gneaubihf
  • /opt/vc/lib wasn’t copied, I guess I didn’t need it after all.  It might be needed later.

3c) UPDATE:  Turns out that in my repeated trial and error, I skipped a step that had been working previously.  Instead of copying the RPi libraries to Linaro, I actually just needed to specify the locations by passing the following lines to G++ (D:\rpi-libs is my Windows-local copy of the RPi static libraris):

3d) UPDATE: One final problem is that every .o file must be passed to the linker IF that file does not exist in a location that was specified in the configuration when the linker was built.  As such, I found I needed to copy the following files to my SOURCE directory because they were HARD CODED into the linker.  I could NOT pass these in as object to the compiler, because then they would be listed twice (again, they were HARD CODED), and then the compiler still wouldn’t be able to find the file (I would pass it a /path/to/crt1.o, but it would still look for crt1.o):

Of course, via my make file, I can copy these files before compilation, then delete them afterwards and I’m none the wiser.  Still, it’s a terrible hack that I can only work around by a) copying my files to the predefined paths built into the Linaro GNU linker or b) building my own linker.  I’m lazy, and this solution works.

Finally, Success

Once I had performed the above steps, I was able to build my executable and SCP it to my RPi.  A quick “file <executable>” showed that it had been linked against the proper libraries.  A quicker “chmod +x” and my executable was chugging away (or actually, waiting for a connection, but still, success!!).  Now when I make changes from my Windows PC, I can quickly compile them for my Raspberry Pi.  No more waiting for the little ARM processor to chug-away.

What I Learned

First of all, I learned no one seems to cross-compile FROM Windows.  In fact, searching “windows linux cross compiler”, “linux cross compiler windows binaries”, etc.. all turn up links for people wanting to “Build Windows applications from Linux”.  I still don’t have an x86 cross-compiler, and I refuse to install Cygwin.  How is there not a native Windows executable for an x86 Linux cross-compiler?  Maybe there is and it’s buried under the Linux to Windows links?

Second, GNU Linker is hard-coded to search library paths, and there is no command-line option to fix this.  If I was on Linux, supposedly setting the LIBARY_PATH environment variable would have worked, but that’s a hack.  You should be able to directly tell the linker what you want to do.

Finally, it seems no one is bothered by the fact that the default solution seems to be “make your own cross-compiler”.  Why would thousands of individual developers each need to create their own cross-compiler.  It is nice that the Raspberry Pi creators have a cross-compiler, but it’s stuck at GCC 4.6 and only works under Linux.  It would be nice if they at least had GCC 4.7 and GCC 4.8, then maybe I could have been motivated to install a Linux VM.

My Linaro GCC 4.7 Raspbian Linux 2.6.26 Cross-Compiler Windows Binaries (whew)

UPDATE:  These files still work, but I think I updated my steps above with a more portable solution.  I have uploaded my files here: http://svn.hellwig.us/binaries/Linaro/linaro-14.01-gcc-4.7-arm-RPi-raspbian-2.6.26.zip.  Simply unzip this directory somewhere to your computer.  Point your makefile or environment to the “<install>\bin” directory, and you should be good to go building RPi executables from your Windows PC.  I did not modify any of the Linaro executables or header files (header files might cause some problems down the line,but hopefully not many).  I did not modify any of the Raspbian static library files.  I simply merged the Linaro toolchain with the Raspbian libraies and viola, a cross-compiler was born.  Don’t forget, if you use this, that you’ll need to specify certain compiler options to target the RPi hardware specifically.

Ok, so here is another coding question:

Given an input string, determine whether or not the string is composed of only abbreviations from the periodic table of elements.

This actually sounds kind of complex, like the stacking athlete question I answered previously, but it actually has a 2-line solution:

That’s right, given a string in $line, $result with be true if the string is composed solely of element name abbreviations. Simple huh? As for complexity, who knows. I don’t know enough of how regular expressions are computed to guess, but I assume it’s been made to be somewhat efficient.

Of course, additional optimizations can be made, for instance, “J” is not used in an elemental abbreviation, so any line with “J” could be disqualified (of course that requires a linear search of the data first).

Anyway, here’s the full script, in case you want to impress your friends.

Second example problem:

Given an m * n field of trees and open spaces, find out the biggest square house that can be built with no intersection with trees.

For a suboptimal solution, simple solution starts with scanning every space to see if it is open or not (starting in the upper-left). Find the largest possible square starting at the given space, then find the maximum size square possible in the entire area.

Note that each loop has two conditions, the total width/height of the area AND the max value.  This max-value check keeps the loops from searching for open squares that could not possibly be larger than the current largest square (e.g. too close to one edge of the area).

Now, the function itself, starting at the specified location, searches diagonally for trees, stopping at the first and/or nearest tree it encounters.  It does this by first checking the current square for a tree.  If there is no tree it scans downward until it a) finds a tree or b) reaches the edge of the map.  It repeats the process to the right.  It then increments the “top-left” square inward and repeats the process.

The code above has one special optimization.  We first declare a global variable  std::unordered_map<int,int> FoundMax; This unordered (hash) map is keyed on the x,y coordinates of a square, and stores the maximum size of a square found starting at those coordinates.  This way, it we happen to be searching a diagonal sub-set of a previously searched larger square, we will not have to loop further to find the area of that square.  Note that because this searching works diagonally, we could actually use membership in the FoundMax map as an exit condition (or perhaps, in the outer main loop to prevent calling Find2 in the first place).

The complexity of this algorithm is N for the outer-most loop (once for each square).  From each square, we search a maximum of M additional squares (where M < N), making for a worst-case complexity of O(n*m).  This isn’t a valid expression, true, and by investigation we can see that m decreases logarithmically with respect to n, making a true complexity of O(n log(n)).

Ideas for further optimization: For all upper-left squares with the same X value, the maximum distance down is less than the maximum distance of the previous Y value down. We could therefore skip searching for the next downward tree with a lookup of the nearest downward tree from Y-1 (if we were storing that value). The same goes for the nearest tree to the right from the current X for squares starting on the same Y.


So I was recently presented with an example coding exercise as stated below:

Every athlete is characterized by his mass ‘m’ (in kg) and strength ‘s’ (in kg).
You are to find the maximum number of athletes that can form a tower standing one upon another.
An athlete can hold a tower of athletes with total mass equal to his strength or less than his strength.
Input contains the number of athletes n and their parameters.
For example:

m1 s1
m2 s2

mn sn

If mi > mj then si > sj, but athletes with equal masses can be of different strength.
Number of athletes n < 100000. Masses and strengths are positive integers less than 2000000.
For example:
Input #1

3 4
2 2
7 6
4 5

Would yield
Output #1


Working through the Solution

Phase one, figure out how to store the data.  Easy, store it in a linked list, could be a list of tuples/pairs, but I actually created a class to store the weight/strength, because I want to be able to do a complex sort on the data (see below).

Now, the problem description states that if one athlete weights more than another, it will have greater strength as well.  This provides us with one crucial optimization, we know that the strongest athlete will also be the heaviest.  This athlete, therefore, should go on the bottom of the tower.  We could do a linear search for the strongest, but we can also find the strongest after sorting the list, and we’ll want a sorted list for later.  Therefore, using the comparison operators we described above, we can sort the list of athletes, and begin processing as so:

In the above code, we grab the first entry after sorting, which is (by definition) the strongest. Starting with that athlete’s strength as the maximum weight for any tower to be formed. We then pass the array of athletes, the starting point and weight left to our recursive function. The “tot” variable is there for analysis purposes. The function is defined as below.

The analysis of the algorithm is actually contained in the comments above, but I’ll point out some key points here:

  1. The maximum size of the tower (within the current iteration) is limited to the number of athletes left in the list.  If we find a tower equal to the size of the list, we are done.
  2. If the current athlete weighs more than the remaining weight limit, they (and any other athletes of the same mass) cannot fit on the tower.
  3. The first athlete of any weight is also the strongest (thanks to the sorted list).  Therefore, we only need to consider ONE of any athlete with a given weight (e.g. ALL athletes of weight 4 will be able to support a tower of a most the size of the strength of the strongest athlete).

The loop in the above function scans the lists of athletes, to determine the maximum size of a tower each athlete could support, standing on top of the tower as it stands before the function is called.  Therefore, the first time the function is called, it stands each athlete, in sequence, on top of the shoulders of the strongest athlete, and sees how tall the tower can be built.

If the current athlete can fit on the tower at all (based on remaining weight limit), the recursive call then determines the size of the tower that could then stand on the shoulders of the current athlete. Thanks to item 3 above, we must only perform the recursive call once per athlete weight.  This greatly reduces the number of recursive calls.  The return value of the recursive call is compared to the current maximum tower height found previously (and stored if greater than).  The loops then checks that the maximum tower height does not equal the remaining number of athletes.  If it does, we have found the greatest tower height and can return.

The function returns the tallest tower it found that could stand on the shoulders of the previous athlete.  When the recursion returns completely, we have solved the problem. Worse case scenario of this algorithm is actual 2^n, which is terrible.  But this is because every possible combination of towers must be considered.  However, thanks to our end-conditions (the 3 steps above), we can actually reduce actual complexity to a linear (O(n)) complexity.  This is because we really only have to search the list once, excluding “duplicate” athletes (athletes that could not have supported a stronger tower are discounted thanks to step 3).  With step 1 above, we do not need to continue looping once we have found a maximum tower, and can exit.

Anyway, the full source is in my (poorly misspelled) repository here: http://svn.hellwig.us/repo/VC11/ExcersizeTest.


So getting back into C++ programming, I’ve been allocating plenty of dynamic memory (using new  and new[] ) .  However, I’ve only been using delete , never delete[] .  If fact, I’ll be honest, I forgot about delete[] , but that’s OK, I didn’t need the [] form anyway.  Why you ask?

Take this example:

Now, almost every example you see online will actually have delete[] buf; , this is correct, but unnecessary.  Why?  The only difference between delete and delete[] is item deconstruction.

When you call delete, all the memory you allocated with new[] is still erased. The problem is, only the first element (the one being directly pointed to) is deconstructed. None of the other members of your array are deconstructed. In my example this is OK, because the array was an array of “char”s, you can’t deconstruct a char in C++, it’s not an object.


When is delete[] useful?

When you allocate an array of objects that themselves allocate more memory. delete[]  makes sure that each entry in the allocated array is properly deconstructed. Example:


Can I just call delete[] all the time?

Sure, well, almost.   delete[]  works on a specific type of pointer.  When you call myclass * myptr = new myclass[5] , you are telling the system to reserve you sizeof(myclass)*5  bytes of memory.  This size is all that is stored in the memory allocator.  When you call delete[] myptr; , the memory allocator goes through the following process:

  1. How much memory was reserved starting at the address stored in myptr (answer: sizeof(myclass) * 5);
  2. Start at offset = 0.
  3. Call the myclass.deconstructor at the address myptr + offset.
  4. Increase offset by sizeof(myclass)
  5. If offset < total allocated memory, goto 3.  Else, continue;
  6. Delete/Free all memory reserved (sizeof(myclasS) * 5 bytes) starting at address storef in myptr;.

However, if you tried to call delete[] (void *) myptr;  all calls to sizeof(myclass) above will be replaced with sizeof(void), which returns no size (no object can be of type void). Therefore, none of the objects in the array will be deconstructed.  Calling delete[] (void *)ptr;  is the same as calling delete (void *)ptr; .

Similarly, if you recast the pointer to a different type, delete[] will not know the proper size of each object, and any attempts to deconstruct those “objects” will most likely result in errors/exceptions.  Example:

Of course, casting pointers in C++ is always dangerous.  The compiler and system do not keep track of allocated memory by type, only size.  Therefore, you should always be careful when having to cast a pointer.

Note: There are acceptable times to cast a pointer (when reading in bytes that are actually multi-byte values, say casting an int* as a char*), but often times, this can also be handled with Unions.

Anyway, that’s my “quick” overview of the delete and delete[] operations in C++.