Wednesday, 15 February 2012

Static code analysis

One simple way of improving your code is passing it through some of static code analysis tools and fix reported errors and warnings. Such tools will usually point out lines in code that could possibly cause issues with buffer overruns, uninitialized memory, null pointer dereferences, memory and resource leaks, exception safety...or lines that contain code which style could be improved.

If you are using Visual Studio 2010, you can use its static code analysis: open Project Properties -> Configuartion Properties -> Code Analysis. Select desired build configuration and platform and tick Enable Code Analysis for C/C++ on Build. Output window will show code analysis report (search for Running Code Analysis for C/C++... lines).

I want to show here how it looks in the practice. Let's say we have some smelly code:

C.h:


main.cpp:


I played recently with Cppcheck and this is what it reports for the code above:

Cppcheck-report

Visual Studio prints code analysis messages in the Output window and for the code above it contains the following:

1>------ Build started: Project: CppcheckTest, Configuration: Debug Win32 ------
1>Build started 15/02/2012 19:49:12.
1>InitializeBuildStatus:
1> Touching "Debug\CppcheckTest.unsuccessfulbuild".
1>ClCompile:
1> main.cpp
1>c:\...\cppchecktest\c.h(12): warning C4101: 'j' : unreferenced local variable
1>c:\...\cppchecktest\main.cpp(6): warning C4101: 'n' : unreferenced local variable
1>c:\...\cppchecktest\main.cpp(13): warning C4101: 'n1' : unreferenced local variable
1>c:\...\cppchecktest\main.cpp(35): warning C6201: Index '2' is out of valid index range '0' to '1' for possibly stack allocated buffer 'arr'
1>c:\...\cppchecktest\main.cpp(17): warning C6001: Using uninitialized memory 'pInt1': Lines: 13, 16, 17
1>c:\...\cppchecktest\main.cpp(21): warning C6011: Dereferencing NULL pointer 'pInt2': Lines: 13, 16, 17, 20, 21
1>c:\...\cppchecktest\main.cpp(35): warning C6386: Buffer overrun: accessing 'arr', the writable size is '8' bytes, but '12' bytes might be written: Lines: 13, 16, 17, 20, 21, 24, 27, 30, 31, 34, 35
1>c:\...\cppchecktest\main.cpp(17): warning C4700: uninitialized local variable 'pInt1' used
1>ManifestResourceCompile:
1> All outputs are up-to-date.
1>Manifest:
1> All outputs are up-to-date.
1>LinkEmbedManifest:
1> All outputs are up-to-date.
1> CppcheckTest.vcxproj -> C:\...\CppcheckTest\Debug\CppcheckTest.exe
1>FinalizeBuildStatus:
1> Deleting file "Debug\CppcheckTest.unsuccessfulbuild".
1> Touching "Debug\CppcheckTest.lastbuildstate".
1>
1>Build succeeded.
1>
1>Time Elapsed 00:00:04.38
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========


Links and References:
Analyzing Application Quality by Using Code Analysis Tools (MSDN)
How to: Enable and Disable Automatic Code Analysis for C/C++ (MSDN)
Cppcheck

Tuesday, 14 February 2012

XML Data Binding - Part 3: CodeSynthesis XSD example

In my previous article about XML Data Binding, I demonstrated how to use gSOAP in order to convert data from XML document into in-memory C++ objects and vice versa. Today I will show how to use another tool, CodeSynthesis XSD, to perform the same task.

CodeSynthesis XSD depends on Apache Xerces-C++ XML parser so you need to download and set up Xerces in your development environment first. Setup of both tools is described in README.txt file you can find after unpacking downloaded CodeSynthesis XSD archive.

In order to compare gSOAP and CodeSynthesis Data Binding process, let's create a project that does the same XML processing, like gSOAP one: loads XML, reads and displays data, adds new element, displays data again and saves chnages to XML.

We are going to use the same XML schema - library.xsd:



If XML documents use XML schema grammars Xerces parser requires them to specify location of their XML schemas (by using an xsi:schemaLocation attribute if they use namespaces, and an xsi:noNamespaceSchemaLocation if not).



NOTE: Make sure xml document and its schema are in the same directory (application's working directory).

Similar to gSOAP case, we need to compile schema into C++ classes. CodeSynthesis schema compiler is xsd.exe and can be found in the bin directory of the package (e.g. ..\xsd-3.3.0-i686-windows\bin\). This directory should be added to Path environment variable. We want to generate C++/Tree mapping and code for serialization (object to XML; this code is not generated by default) so call XSD compiler with following parameters:

c:\test\XercesCodeSynthesis_Test1>xsd cxx-tree --generate-serialization library.xsd

It creates two files, library.hxx and library.cxx, and we need to include them into our project as they contain definitions of proxy classes.

Again, like we had with gSOAP, CodeSynthesis XSD generates classes that match XML document elements, by name and structure. We can see that in the header, library.hxx (some parts are omitted):



Schema complier has generated and Library_ functions that serialize/deserialize data to/from XML file.

main.cpp contains code that loads XML document into Library object, traverses through its member (vector) Books and displays all Book elements; it then adds a new Book to the collection, displays it again and serializes back to XML document in file:



Output:


Displaying all books in the library:

Book:
Title:Clean Code
Author:Robert C. Martin
ISBN: 0132350882
Copies available: 2

Book:
Title:The Pragmatic Programmer
Author:Andrew Hunt
ISBN: 020161622X
Copies available: 0

Book:
Title:Design patterns
Author:Erich Gamma
ISBN: 0201633612
Copies available: 1


Adding a new book:
Title: Effective C++
Author: Scott Meyers
ISBN:0321334876
Copies available: 50


Displaying all books in the library:

Book:
Title:Clean Code
Author:Robert C. Martin
ISBN: 0132350882
Copies available: 2

Book:
Title:The Pragmatic Programmer
Author:Andrew Hunt
ISBN: 020161622X
Copies available: 0

Book:
Title:Design patterns
Author:Erich Gamma
ISBN: 0201633612
Copies available: 1

Book:
Title:Effective C++
Author:Scott Meyers
ISBN: 0321334876
Copies available: 50

library.xml is changed - a new Book element has been added:



Links and References:
Boris Kolpackov: An Introduction to XML Data Binding in C++

Friday, 10 February 2012

How to read command line arguments with NSIS

GetParameters copies all command line arguments into provided variable (as a string). GetOptions extracts value of a specified option within provided parameters string. The following example demonstrates how to get value of command line option "-s":

MyInstaller.nsi:



If we call this installer with following parameters:

"MyInstaller.exe" -t=12 -s=yes -k=abc

...a message box will display text: "s = yes".

GetOptions is not case sensitive so if we called it with "-S", the result would be the same. GetOptionsS is a case sensitive form of this function.

Thursday, 9 February 2012

XML Data Binding - Part 2: gSOAP example

In my previous post, I explained the benefits of XML Data Binding. In this article I will show how to use gSOAP for conversion of data stored in XML format into objects and vice versa.

Download and unpack the latest gSOAP release package. In the previous article I said that XML Data Binding tools compile XML schemas and create C++ classes that represent XML elements. gSOAP's (Win32) compiler is located in ..\gsoap_2.8.6\gsoap-2.8\gsoap\bin\win32 directory and its full path (e.g. c:\tools\gsoap_2.8.6\gsoap-2.8\gsoap\bin\win32) should be added to Path environment variable. This directory contains two components of gSOAP XML compiler: wsdl2h.exe, which compiles XML schema to intermediate header file and soapcpp2.exe, which generates classes (in the header and source file that we need to include in our C++ project).

We need to get XML schema from our XML file. Let's use XML file based on the one from the previous article but modified by including namespace and using proper naming convention (title case for elements and camelcase for attributes):

library.xml:



We can generate schema from this XML in Visual Studio: select XML item in the main menu and click on Create Schema in the drop down menu. Visual Studio generates schema document in Russian doll design style. It supports only this XSD design pattern because it is the most restrictive one.

library.xsd:



Save both XML and XSD files in the project directory.

Now let us compile schema. This is a two step process. Schema is passed to wsdl2h.exe which generates intermediate header. soapcpp2.exe uses that header to create proxy C++ classes for data binding (header file with their declarations and source file with their definitions).

We need to provide namespace used in our XML document ("gt") as gSOAP would otherwise use its generic namespace name ("ns1") when generating data type names and when serializing our data object back to the XML.

c:\test\gSOAP_Test1>wsdl2h.exe -t "c:\tools\gsoap_2.8.6\gsoap-2.8\gsoap\
typemap.dat" -N "gt" Library.xsd

** The gSOAP WSDL/Schema processor for C and C++, wsdl2h release 2.8.6
** Copyright (C) 2000-2011 Robert van Engelen, Genivia Inc.
** All Rights Reserved. This product is provided "as is", without any warranty.

** The wsdl2h tool is released under one of the following two licenses:
** GPL or the commercial license by Genivia Inc. Use option -l for details.

Saving Library.h

Reading type definitions from type map file 'c:\tools\gsoap_2.8.6\
gsoap-2.8\gsoap\typemap.dat'

Reading file 'Library.xsd'...
Done reading 'Library.xsd'

To complete the process, compile with:
> soapcpp2 Library.h
or to generate C++ proxy and object classes:
> soapcpp2 -j Library.h

We just need to follow the instruction given in the report above - call soapcpp2:

c:\test\gSOAP_Test1>soapcpp2 -I "c:\tools\gsoap_2.8.6\gsoap-2.8\gsoap\
import" Library.h

** The gSOAP code generator for C and C++, soapcpp2 release 2.8.6
** Copyright (C) 2000-2011, Robert van Engelen, Genivia Inc.
** All Rights Reserved. This product is provided "as is", without any warranty.

** The soapcpp2 tool is released under one of the following two licenses:
** GPL or the commercial license by Genivia Inc.

Saving soapStub.h annotated copy of the input declarations
Saving gt.nsmap namespace mapping table
Saving soapH.h interface declarations
Saving soapC.cpp XML serializers

Compilation successful

c:\test\gSOAP_Test1>

As we can see in the report, class declarations are in the following header:

soapStub.h (irrelevant parts omitted):



If observing original XML document and generated classes, we can see the parallel between them: element types are mapped to classes; single children nodes and attributes are mapped to class members; sequence is mapped to a vector. _gt__Library class matches Library element. Its members, Books and Staff match XML elements of the same name. In XML, these nodes are of the sequence type so their C++ implementation (_gt__Library_Books and _gt__Library_Staff) uses STL collection type (vector) to model them. Books element contains Book elements so _gt__Library_Books's vector member contains elements of type _gt__Library_Books_Book. In the same way, Librarian element is mapped to _gt__Library_Staff_Librarian class and _gt__Library_Staff's vector contains its instances.

Class names look a bit ugly but that is because they are made by joining namespace name("gt") and element type name. If namespace isn't specified in the schema, gSOAP uses generic namespace name - "ns1". If element name contains underscore, that character is replaced with "_USCORE" because gSOAP maps hyphens to normal underscores [source].

For Russian doll styled schema, class members and vector elements are objects. This is not the case for schemas designed in Salami slice or Venetian blind styles: class members and vector elements are pointers to objects. This happens even if minOccurs attribute is set to 1. I don't know how to force gSOAP to generate classes that force composition class relationship for any design pattern of the schema provided. I found here one explanation of gSOAP's reasoning: gSOAP generates pointers when it needs to be able to represent a NULL value. If you have defined an element with minOccurs = "0", then you will get a pointer generated in your code. You can then inspect this pointer. If it is NULL, then you know that the element is not present. Conversely, you can choose to set the pointer, or not, to indicate that the element is present or not. Another author says: there are many different ways to define XML Schemas and the design choice can seriously impact the generation of implementation classes in the technology of your choice. There are different schema design styles such as the Russian Doll, Venetian Blind and Garden of Eden that can be followed.

Another generated header is soapH.h (snippet):



We can use these two generated functions to read the content of the root element (Library) into object and to write it back to the XML document. gSOAP compiler has done a great job for us!

Anyway, let's see what we can do with generated classes.

Before compiling your test project, make sure you have added the following paths to Additional Include Directories in Project Settings: c:\DEVELOPMENT\Toolkits\gsoap_2.8.6\gsoap-2.8\gsoap; c:\DEVELOPMENT\Toolkits\gsoap_2.8.6\gsoap-2.8\gsoap\import. Also, make sure you've included soapH.h, soapStub.h, soapC.cpp and stdsoap2.cpp into the project.

In order to use gSOAP engine, we need to create instance of gSOAP runtime context - struct soap. There is a sequence of commands that initialize and clean up this object so I wrapped it into RAII compliant class (CScopedSoap) which makes its usage exception safe.

Notice how it's easy to modify data when we are dealing with objects instead of digging and traversing DOM tree. Adding a new Book is nothing more than adding a new _gt__Library_Books_Book object at the end of the vector!

main.cpp:



Patch I applied in LoadXML() function is necessary as soap_read__gt__Library() for some reason sets mode of the standard input stream to BINARY although it reads data from a file stream. It never reverts stdio's mode back to TEXT. This has a consequence of getline() returning a string that contains Carriage Return character at the end and that character appears in new elements inserted into the XML document. I've posted a question about this on Stack Overflow and will update this article on this as soon as I clarify gSOAP's behaviour in this case.

This is the application's output:

Displaying all books in the library:

Book:
Title:Clean Code
Author:Robert C. Martin
ISBN: 0132350882
Copies available: 2

Book:
Title:The Pragmatic Programmer
Author:Andrew Hunt
ISBN: 020161622X
Copies available: 0

Book:
Title:Design patterns
Author:Erich Gamma
ISBN: 0201633612
Copies available: 1


Adding a new book:
Title: Effective C++
Author: Scott Meyers
ISBN:0321334876
Copies available: 50


Displaying all books in the library:

Book:
Title:Clean Code
Author:Robert C. Martin
ISBN: 0132350882
Copies available: 2

Book:
Title:The Pragmatic Programmer
Author:Andrew Hunt
ISBN: 020161622X
Copies available: 0

Book:
Title:Design patterns
Author:Erich Gamma
ISBN: 0201633612
Copies available: 1

Book:
Title:Effective C++
Author:Scott Meyers
ISBN: 0321334876
Copies available: 53

And XML contains a new Book element:

library.xml:



Links and References:

The gSOAP Toolkit for SOAP Web Services and XML-Based Applications
gSOAP 2.8.7 User Guide
Genivia gSOAP
Robert van Engelen: "gSOAP & Web Services"
gSOAP Yahoo group
gSOAP tagged question on Stack Overflow

Monday, 6 February 2012

XML Data Binding - Part 1: Why do we need it?

Your application uses some XML parsing tool (Xerces, libxml, TinyXML, TinyXML++, RapidXml, PugiXML,...) in order to load XML file into a document object, then you traverse through a DOM tree structure, look for nodes, their children, search their attributes so can modify its content (add, modify or delete elements or attributes)...and all this usually requires lots of loops and string comparisons which creates huge, hard to maintain code. Wouldn't be better if you could load XML document into some object in memory which attributes match elements in your XML? You would then be dealing with (C++) objects instead of the complex DOM tree which is much quicker, easier, type safe and less error prone.

Let's say we have some XML that keeps track of the state of the local library. To keep model simple, we can say that library comprises books and staff. Each book has its title, author and ISBN number. Each member of the staff, librarian, has a name. XML document could look like this:

library.xml


We can map each class of nodes into a C++ class where class attributes are node's attributes and its children; siblings are stored in a vector. We need a single instance of the class which represents a root node. Its constructor loads XML document from a file on disk and its destructor saves (eventually modified) XML document back to the file. Our class model and use case might look like this:

main.cpp:



XML document was loaded into object, modified by adding a new book and saved back into the file in just three lines of code! Awesome! But this code is unfinished and actually doesn't work properly in the real life as I omitted the hardest bit: loading and parsing XML in library's constructor and serializing/marshalling object back to the file in the destructor. All I wanted to show was how quick and easy is to manipulate XMLs when representing them through objects - a concept which is known as XML Data Binding.

Another problem steams from the fact that for each new XML we would need to write completely new classes - doing this manually is a no way to go but luckily there are tools that do this automatically. They compile XML schema (a document which specifies XML document itself) into a set of classes, following precise rules of mapping XML into objects (Object/XML Mapping or O/X Mapping). In next two articles I will show how to use gSOAP and CodeSynthesis XSD for this.

Links and References:
XML Data Binding Tools

Monday, 30 January 2012

Thread and process synchronisation with mutex

In my previous article I described how to use semaphore in order to synchronise threads or processes. Mutex is a semaphore specialisation and can be used in the special case - when only one thread (or process) is allowed to access shared resource.

Mutex is a synchronisation object that controls resource access (critical section execution) by maintaining the knowledge of its current owner (accessor - thread or process). Mutex strictly limits access to a single accessor at a time.

Ownership over the mutex is controlled with following operations:
  • acquire() (or lock()) - passes the ownership of the mutex to the calling thread
  • release() (or unlock()) - calling thread gives up its ownership of mutex. Only thread that acquired the mutex can release it

Mutex has two states:
  • signalled - when no thread owns it; acquire() does not block
  • non-signalled - when owned by some thread; acquire() in other thread blocks till mutex gets signalled

When Thread1 acquires ownership of mutex, it has a exclusive right to access the shared resource (to execute the code in the critical section). Once it's finished, it needs to release mutex so another thread can acquire ownership and access the resource. If thread does not release the mutex, other threads cannot acquire it and application is in a deadlock. It is therefore very important to make sure that mutex is released under any condition, no matter of the outcome of the code in the critical section! RAII-compliant design can prevent program's execution to leave critical section's scope without releasing the mutex.

Semaphore behaves like shared mutex with a count controlled by multiple threads that are allowed to access shared resource. Mutex is stricter, only thread that owns it has the power to release it, and only that thread can access shared resource.

The following example shows how to use mutex in order to synchronise processes that write into the shared file (problem described in this article):

main.cpp:



Links and References:

Mutex Objects (MSDN)
Using Mutex Objects (MSDN)
Mutual exclusion (Wikipedia)

Thread and process synchronisation with semaphores

Semaphore is a synchronisation object that controls resource access (critical section execution) by maintaining the number (count, n) of accessors (threads or processes) (still) allowed to access the resource. While mutex strictly limits access to a single accessor at a time, semaphore allows up to N (N > 0) parallel accessors. N is defined when semaphore is being created and it represents maximum possible value of the semaphore count.

Semaphore count is changed through following operations:
  • wait() - which, on return,  decreases count on successful return (minimum is 0)
  • release() - which increases count (maximum is N)

Semaphore has two states:
  • signalled - when 0 < n <= N; wait() does not block
  • non-signalled - when n == 0; wait() blocks till semaphore gets signalled (or returns on expired timeout)

There are two types of semaphores:
  • counting - based on the count 0 <= n <= N where N > 1
  • binary - (specialisation of counting) where N == 1; when in signalled state we say it is unlocked; when in non-signalled state we say it is locked

Semaphores are signalling accessors the right of way - just as traffic semaphores, but unlike them, accessors themselves are controlling when the light for others will become green - through wait() and release() operations. Accessors are wait()-ing for a semaphore. If semaphore is signalled (n > 0), wait() returns immediately, decreasing semaphore count. When accessor (this one or any other) is finished with the shared resource it calls release() on the semaphore, increasing its count. Accessor will block on wait()-ing if semaphore is non-signalled (when maximum allowed number of accessors are sharing the resource). As soon as some accessor finishes the work and releases the semaphore, accessor's wait() will unblock and it will be allowed to access the resource.

Obviously, if N is set to 1, semaphore (called binary semaphore in this case) logically behaves like a mutex, allowing only one accessor at a time. There is a difference between two of them though: only accessor that locked the mutex can unlock it (mutex is owned by the accessor), but any accessor can release semaphore.

Majority of semaphore examples on the internet are focused on consumer-producer problem. I wanted to show use of semaphore on the example of traffic control - something that resembles the real semaphore. So, let's say we have three single-lane, one way roads joining just before a bridge which is one way but has two lanes. To reduce congestion on the bridge, traffic from only two access roads is allowed at a time. There is a semaphore with the red and green light by the each road and once it turns green, it remains in that state for some time period T. First two opened roads get green light first. As soon as the timeout expires for one of those roads and semaphore shows red, semaphore will show green for traffic that has been waiting at the third road.

Semaphore-example

We can think of roads (traffic on them) as accessors and the bridge as a resource: two roads can lead traffic to the bridge at the same time (two accessors are allowed to access shared resource). Obviously, we will set the maximum value for the semaphore count to 2 in our model. Thread will wait() as long as count is 0 but as soon as it gets increased to 1, wait() returns (decreasing count to 0 again).

This example aims to show how semaphore limits parallel access to the shared resource and how accessors (threads in this case) themselves control semaphore by wait()-ing for the semaphore and release()-ing it.

To stop threads I tend to use event object - not a flag (volatile bool variable). They are thread-safe and thread callbacks don't need to return with delay of one additional wait() cycle in the case when termination has been requested.

I wrapped event and semaphore objects (handles) into RAII-compliant classes - CScopedEvent and CScopedSemaphore.

main.cpp:




Output:

Road 1 opened
0 road(s) is(are) generating traffic...
Road 1 got green light for the next 10 seconds. Generating traffic...
1 road(s) is(are) generating traffic...
Road 2 opened
1 road(s) is(are) generating traffic...
Road 2 got green light for the next 10 seconds. Generating traffic...
2 road(s) is(are) generating traffic...
Road 3 opened
2 road(s) is(are) generating traffic...

Road 1 got red light. Waiting for a green light...
1 road(s) is(are) generating traffic...
Road 3 got green light for the next 10 seconds. Generating traffic...
2 road(s) is(are) generating traffic...

Road 2 got red light. Waiting for a green light...
1 road(s) is(are) generating traffic...
Road 1 got green light for the next 10 seconds. Generating traffic...
2 road(s) is(are) generating traffic...

Road 3 got red light. Waiting for a green light...
1 road(s) is(are) generating traffic...
Road 2 got green light for the next 10 seconds. Generating traffic...
2 road(s) is(are) generating traffic...

Road 1 got red light. Waiting for a green light...
1 road(s) is(are) generating traffic...
Road 3 got green light for the next 10 seconds. Generating traffic...
2 road(s) is(are) generating traffic...

Road 2 got red light. Waiting for a green light...
1 road(s) is(are) generating traffic...
Road 1 got green light for the next 10 seconds. Generating traffic...
2 road(s) is(are) generating traffic...

Road 3 got red light. Waiting for a green light...
1 road(s) is(are) generating traffic...
Road 2 got green light for the next 10 seconds. Generating traffic...
2 road(s) is(are) generating traffic...

Road 1 got red light. Waiting for a green light...
Road 3 got green light for the next 10 seconds. Generating traffic...
1 road(s) is(are) generating traffic...2 road(s) is(are) generating traffic...


Road 2 got red light. Waiting for a green light...
1 road(s) is(are) generating traffic...
Road 1 got green light for the next 10 seconds. Generating traffic...
2 road(s) is(are) generating traffic...

Road 3 got red light. Waiting for a green light...
1 road(s) is(are) generating traffic...
Road 2 got green light for the next 10 seconds. Generating traffic...
2 road(s) is(are) generating traffic...

Road 1 got red light. Waiting for a green light...
Road 3 got green light for the next 10 seconds. Generating traffic...
1 road(s) is(are) generating traffic...
2 road(s) is(are) generating traffic...

Road 2 got red light. Waiting for a green light...
1 road(s) is(are) generating traffic...
Road 1 got green light for the next 10 seconds. Generating traffic...
2 road(s) is(are) generating traffic...

Road 3 got red light. Waiting for a green light...
1 road(s) is(are) generating traffic...
Road 2 got green light for the next 10 seconds. Generating traffic...
2 road(s) is(are) generating traffic...
Closing road 3 ...
Road 3 got request to get closed
Road 3 closed
Closing road 2 ...

Road 1 got red light. Waiting for a green light...
1 road(s) is(are) generating traffic...
Road 1 got green light for the next 10 seconds. Generating traffic...
2 road(s) is(are) generating traffic...

Road 2 got red light. Waiting for a green light...
1 road(s) is(are) generating traffic...
Road 2 got request to get closed
Road 2 closed
Closing road 1 ...

Road 1 got red light. Waiting for a green light...
0 road(s) is(are) generating traffic...
Road 1 got request to get closed
Road 1 closed

The example above shows how semaphore controls access to the resource by multiple threads. But please note that resource was NOT made thread safe! Semaphore was just allowing up to N threads (2 in our case) to be active at a time (and generate the traffic towards the bridge). If we wanted to limit the number of vehicles on the bridge and to control traffic lights depending on the current bridge load, we would have had to limit the number of active threads to 1. In that case only one road would have green light at a time.

In the next example I want to show how to synchronise multiple processes in accessing shared resource. Let's say we have an app which writes some log into the file and does it in a loop. The code could look like this:

main.cpp:



If run with parameter of value e.g. "012345", this app will create text file with the following content:

test.log:

[PID = 4408] Iteration # 1 012345
[PID = 4408] Iteration # 2 012345
[PID = 4408] Iteration # 3 012345
[PID = 4408] Iteration # 4 012345
...

If we run simultaneously two or more instances of this process, they will all write into the same file, increasing its size with each write operation. Manipulator endl inserts new line character ('\n') at the end of the line and flushes the buffer to the disk. Obviously, before writing to the disk, our file stream object needs to know the current size of the file in order to move write pointer to the file end. If Process2 appends a new line to the file after Process1 reads file size but before Process1 writes into it, Process2 will increase file size but Process1 will know only about the previous file size and start writing at the position set accordingly, effectively overwriting Process2's last written line!

The following code is the content of the script (DOS batch file) which runs three instances of our application, providing each with different argument:

run_processes.bat:

@start Process.exe 012345
@start Process.exe ABCDEFHIJKLM
@start Process.exe 987654321

Arguments are of different length so we can easily detect the place of the corruption in the output file, like this one:

test.log:

...
[PID = 8280] Iteration # 214 ABCDEFHIJKLM
[PID = 8120] Iteration # 208 987654321
M
[PID = 8280] Iteration # 216 ABCDEFHIJKLM
...

What happened here? First of all, we need to know that on Windows, \n read from stream buffer is expanded to \r\n (CR-LF) before writing it to the file on disk. Process 8120 has updated its knowledge of file size but before it wrote iteration #208 log, process 8280 had written its log for iteration #215 so basically we had this:

test.log (showing hidden CR-LF characters):

...
[PID = 8280] Iteration # 214 ABCDEFHIJKLM\r\n
[PID = 8280] Iteration # 215 ABCDEFHIJKLM\r\n
...

Then process 8120 wrote its #208 log, but effectively overwriting 8280's #215, after what 8280 wrote its log #216:

test.log (showing hidden CR-LF characters):

...
[PID = 8280] Iteration # 214 ABCDEFHIJKLM\r\n
[PID = 8120] Iteration # 208 987654321\r\nM\r\n
[PID = 8280] Iteration # 216 ABCDEFHIJKLM\r\n
...

Obviosuly, we need to protect file so only one process is accessing it at a time. We can do that with semaphore or mutex which are shared between multiple processes (and therefore must be named).

In this article I will show how to achieve it with semaphore:

main.cpp:



All processes are competing to get semaphore signal. First process whose wait() returns (decreasing semaphore count by 1 - possibly to minimal value of 0 in which case all other processes block on their wait()) gets a exclusive access to a file and updates its content after which it releases semaphore (increasing its count to 1 again). All processes are competing again and the one whose wait() returns first gets its slot of exclusive access. There is no corruption in the file any more:

test.log:

...
[PID = 6036] Iteration # 213 987654321
[PID = 10940] Iteration # 213 012345
[PID = 11644] Iteration # 213 ABCDEFHIJKLM
[PID = 6036] Iteration # 214 987654321
[PID = 10940] Iteration # 214 012345
[PID = 11644] Iteration # 214 ABCDEFHIJKLM
[PID = 6036] Iteration # 215 987654321
[PID = 10940] Iteration # 215 012345
[PID = 11644] Iteration # 215 ABCDEFHIJKLM
...

Note: Although this example uses semaphore for process synchronisation, mutex is here more natural solution - we don't want to limit number of accessors to several (N) but only to 1. Only process that locks the mutex can unlock it and with binary semaphore we are just emulating this behaviour.

Links and References:
Semaphore Objects (MSDN)
Using Semaphore Objects (MSDN)
Semaphore (Wikipedia)
Windows Thread Synchronization - Synchronization Using Semaphores
Joseph M. Newcomer: Semaphores
Mutex or Semaphore for Performance?
Mutex vs Semaphore
Mutex vs Semaphores
Difference between binary semaphore and mutex