For architectures that make use of TLS variant II (i.e. amd64, ia32 and
sparc64), fix the way TLS and TCB is allocated. Now, TLS is allocated using
memalign() with the alignment specified in _tls_alignment. Size of TLS data
itself is rounded up to be a multiple of _tls_alignment.