Thread-Local Storage

Sometimes shaders need to share data, but need to modify it. Sharing would allow using the first method for constant data, but that does not permit changing the data during rendering because no consistent writes are possible - multiple instances of the shaders in different threads may write to the same data simultaneously. This can corrupt even simple operations such as incrementing a variable, because the execution order is unpredictable. The sequence read_A write_A read_B write_B in threads A and B works, but there is no way to avoid the sequence read_A read_B write_A write_B. This problem is called a race condition, and it can cause one increment to be lost, in rare and hard-to-debug cases. Locking would prevent that but may cause an unacceptable performance loss.

Thread-local storage avoids the race condition by providing one copy of the data to each thread. Multiple threads can execute simultaneously but with any single thread the execution is strictly sequential, so that the read/write race condition cannot happen. Here is an example that counts shader calls:

DLLEXPORT void myshader(           /* main shader */
    miState         *state,
    struct myshader *paras,
    miBoolean       *inst_req)
{
    int             *counter;

    mi_query(miQ_FUNC_TLS_GET, state, miNULLTAG, &counter);
    if (!counter) {
        counter = mi_mem_allocate(sizeof(int));
        mi_query(miQ_FUNC_TLS_SET, state, miNULLTAG, &counter);
        *counter = 0;
    }
    (*counter)++;
    ...
    return(miTRUE);
}

DLLEXPORT miBoolean myshader_init(
   miState          *state,
   struct myshader  *paras,
   miBoolean        *init_req)
{
   *init_req = miTRUE;
   return(miTRUE);
}

DLLEXPORT void myshader_exit(      /* exit shader */
    miState         *state,
    struct myshader *paras)
{
    int             **ptrs;
    int             num, i, total = 0;

    if (!paras)
        return(miTRUE);
    mi_query(miQ_FUNC_TLS_GETALL, state, miNULLTAG, &counters, &num);
    for (i=0; i < num; i++) {
        total += *counters[i];
        mi_mem_release(counters[i]);
    }
    mi_info("myshader was called %d times", total);
    return(miTRUE);
}

The thread-local data is a single integer that counts shader calls in this thread. Since init shaders are called once per shader or once per shader instance, but not once every time the shader is called in a new thread, the data cannot be installed and initialized in the init shader. Instead, it is created in the main body if it did not already exist. This is safe because no two threads will get the same pointer returned by miQ_FUNC_TLS_GET. (Note that setting * counter to zero is actually redundant because mi_mem_allocate always returns zeroed memory.)

The example exit shader collects all the thread-local counters of all threads that installed a counter, and computes and prints the total. It is done during shader instance exit, not the shader exit, by checking that paras is nonzero. This requires shader instance init/exit to be enabled in the init function by setting init_req to miTRUE.

This will only work on a single host because each host exits its own shaders, and there is no way to communicate the counters between hosts. Moreover, slave hosts may come and go, and may call their exit shaders multiple times for a single frame.

Thread-local shader storage relies on three new mi_query modes:

The second argument to mi_query must be the shader state, and the third must be a null tag.

Copyright © 1986, 2013 NVIDIA Corporation