English 中文(简体)
Generate all combinations of a char array inside of a CUDA __device__ kernel
原标题:

I need help please. I started to program a common brute forcer / password guesser with CUDA (2.3 / 3.0beta). I tried different ways to generate all possible plain text "candidates" of a defined ASCII char set.

In this sample code I want to generate all 74^4 possible combinations (and just output the result back to host/stdout).

$ ./combinations
Total number of combinations    : 29986576

Maximum output length   : 4
ASCII charset length    : 74

ASCII charset   : 0x30 - 0x7a
 "0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxy"

CUDA code (compiled with 2.3 and 3.0b - sm_10) - combinaions.cu:

#include <stdio.h>
#include <cuda.h>

__device__ uchar4 charset_global = {0x30, 0x30, 0x30, 0x30};
__shared__ __device__ uchar4 charset[128];

__global__ void combo_kernel(uchar4 * result_d, unsigned int N)
{
 int totalThreads = blockDim.x * gridDim.x ;
 int tasksPerThread = (N % totalThreads) == 0 ? N / totalThreads : N/totalThreads + 1;
 int myThreadIdx = blockIdx.x * blockDim.x + threadIdx.x ;
 int endIdx = myThreadIdx + totalThreads * tasksPerThread ;
 if( endIdx > N) endIdx = N;

 const unsigned int m = 74 + 0x30;

 for(int idx = myThreadIdx ; idx < endIdx ; idx += totalThreads) {
  charset[threadIdx.x].x = charset_global.x;
  charset[threadIdx.x].y = charset_global.y;
  charset[threadIdx.x].z = charset_global.z;
  charset[threadIdx.x].w = charset_global.w;
  __threadfence();

  if(charset[threadIdx.x].x < m) {
   charset[threadIdx.x].x++;

  } else if(charset[threadIdx.x].y < m) {
   charset[threadIdx.x].x = 0x30; // = 0
   charset[threadIdx.x].y++;

  } else if(charset[threadIdx.x].z < m) {
   charset[threadIdx.x].y = 0x30; // = 0
   charset[threadIdx.x].z++;

  } else if(charset[threadIdx.x].w < m) {
   charset[threadIdx.x].z = 0x30;
   charset[threadIdx.x].w++;; // = 0
  }

  charset_global.x = charset[threadIdx.x].x;
  charset_global.y = charset[threadIdx.x].y;
  charset_global.z = charset[threadIdx.x].z;
  charset_global.w = charset[threadIdx.x].w;

  result_d[idx].x = charset_global.x;
  result_d[idx].y = charset_global.y;
  result_d[idx].z = charset_global.z;
  result_d[idx].w = charset_global.w;
 }
}

#define BLOCKS 65535
#define THREADS 128

int main(int argc, char **argv)
{
 const int ascii_chars = 74;
 const int max_len = 4;
 const unsigned int N = pow((float)ascii_chars, max_len);
 size_t size = N * sizeof(uchar4);

 uchar4 *result_d, *result_h;
 result_h = (uchar4 *)malloc(size );
 cudaMalloc((void **)&result_d, size );
 cudaMemset(result_d, 0, size);

 printf("Total number of combinations	: %d

", N); 
 printf("Maximum output length	: %d
", max_len);
 printf("ASCII charset length	: %d

", ascii_chars);

 printf("ASCII charset	: 0x30 - 0x%02x
 ", 0x30 + ascii_chars);
 for(int i=0; i < ascii_chars; i++)
  printf("%c",i + 0x30);
 printf("

");

 combo_kernel <<< BLOCKS, THREADS >>> (result_d, N);
 cudaThreadSynchronize();

 printf("CUDA kernel done
");
 printf("hit key to continue...
");
 getchar();

 cudaMemcpy(result_h, result_d, size, cudaMemcpyDeviceToHost);

 for (unsigned int i=0; i<N; i++)
  printf("result[%06u]	%c%c%c%c
",i, result_h[i].x, result_h[i].y, result_h[i].z, result_h[i].w);

 free(result_h);
 cudaFree(result_d);
}

The code should compile without any problems but the output is not what i expected.

On emulation mode:

CUDA kernel done hit
key to continue...

    result[000000]  1000 
...
    result[000128]  5000

On release mode:

CUDA kernel done hit
key to continue...

    result[000000]  1000 
...
    result[012288]  5000

I also used __threadfence() and or __syncthreads() on different lines of the code also without success...

ps. if possible I want to generate everything inside of the kernel function . I also tried "pre" generating of possible plain text candidates inside host main function and memcpy to device, this works only with a very limited charset size (because of limited device memory).

  • any idea about the output, why the repeating (even with __threadfence() or __syncthreads()) ?

  • any other method to generate plain text (candidates) inside CUDA kernel fast :-) (~75^8) ?

thanks a million

greets jan

问题回答

Incidentally, your loop bound is overly complex. You don t need to do all that work to compute the endIdx, instead you can do the following, making the code simpler.

for(int idx = myThreadIdx ; idx < N ; idx += totalThreads)

Let s see:

  • When filling your charset array, __syncthreads() will be sufficient as you are not interested in writes to global memory (more on this later)
  • Your if statements are not correctly resetting your loop iterators:
    • In z < m, then both x == m and y == m and must both be set to 0.
    • Similar for w
  • Each thread is responsible for writing one set of 4 characters in charset, but every thread writes the same 4 values. No thread does any independent work.
  • You are writing each threads results to global memory without atomics, which is unsafe. There is no guarantee that the results won t be immediately clobbered by another thread before reading them back.
  • You are reading the results of computation back from global memory immediately after writing them to global memory. It s unclear why you are doing this and this is very unsafe.
  • Finally, there is no reliable way in CUDA to to a synchronization between all blocks, which seems to be what you are hoping for. Calling __threadfence only applies to blocks currently executing on the device, which can be subset of all blocks that should run for a kernel call. Thus it doesn t work as a synchronization primitive.

It s probably easier to calculate initial values of x, y, z and w for each thread. Then each thread can start looping from its initial values until it has performed tasksPerThread iterations. Writing the values out can probably proceed more or less as you have it now.

EDIT: Here is a simple test program to demonstrate the logic errors in your loop iteration:

int m = 2;
int x = 0, y = 0, z = 0, w = 0;

for (int i = 0; i < m * m * m * m; i++)
{
    printf("x: %d y: %d z: %d w: %d
", x, y, z, w);
    if(x < m) {
        x++;
    } else if(y < m) {
        x = 0; // = 0
        y++;
    } else if(z < m) {
        y = 0; // = 0
        z++;
    } else if(w < m) {
        z = 0;
        w++;; // = 0
    }
}

The output of which is this:

x: 0 y: 0 z: 0 w: 0
x: 1 y: 0 z: 0 w: 0
x: 2 y: 0 z: 0 w: 0
x: 0 y: 1 z: 0 w: 0
x: 1 y: 1 z: 0 w: 0
x: 2 y: 1 z: 0 w: 0
x: 0 y: 2 z: 0 w: 0
x: 1 y: 2 z: 0 w: 0
x: 2 y: 2 z: 0 w: 0
x: 2 y: 0 z: 1 w: 0
x: 0 y: 1 z: 1 w: 0
x: 1 y: 1 z: 1 w: 0
x: 2 y: 1 z: 1 w: 0
x: 0 y: 2 z: 1 w: 0
x: 1 y: 2 z: 1 w: 0
x: 2 y: 2 z: 1 w: 0




相关问题
Message charset

How do I get the charset from javax.mail.Message object?

Adding "charset" to all ASP.NET MVC HTTP responses

Is there an easy way to specify all "normal" views is an ASP.NET MVC app are to have charset=utf-8 appended to the Content-Type? View() lacks an override that allows you to specify the Content-Type, ...

Iterating over string/strlen with umlauted characters

This is a follow-up to my previous question . I succeeded in implementing the algorithm for checking umlauted characters. The next problem comes from iterating over all characters in a string. I do ...

Special characters are not supported

I am having problem to display the special characters like ’, é in Firefox and IE. But these characters are supported for the local server. I have used the following <!DOCTYPE html PUBLIC "-...

找到一份所有编码的一览表。

我正在撰写一篇文章,试图将 by铁丝带入第2.6条的许多不同的编码。 是否有办法获得一份清单,列出我可以重复的可用编码?

Character Encodings in PHP and MySQL

Our website was developed with a meta tag set to... <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> This works fine for M-dashes and special quotes, etc. However, I ...

What text encoding to use?

I need to setup my PostgreSQL DB s text encoding to handle non-American English characters that you d find showing up in languages such as German, Spanish, and French. What character encoding should ...

Flash coding problem

I got some information in my Flash application from a php script. Flash displayed it as: Radioart=Mia Frejman - Ett hjärta - Ett Hjärta The string contain some Swedish symbols. How I can output ...

热门标签