Dreamer GUNDAM의 블로그::무료로 뉴타입이 되는 별까지 앞으로 356광년

'Computer Vision'에 해당되는 글 34건

2014.10.04 증강 현실 테스트
2014.06.01 OpenCL : Using work item to partition data (5) 1
2014.05.02 OpenCL::Query OpenCL kernel (4)
2014.03.31 OpenCL Test Program (3)
2014.03.18 OpenCL Test Program (2)
2014.03.15 OpenCL test program (1)
2014.03.15 OpenCL Architecture
2013.05.28 Optical Flow에 의한 영상 정보 분석
2013.05.19 OpenCV를 이용한 Face Detection
2011.10.05 3D Noise Reduction algorithm test

Computer Vision2014. 10. 4. 09:28

증강 현실 테스트

AR관련 테스트 잔행.

몇가지 샘플을 돌려보니 마커를 인식하는 카메라 영상이 중요합니다.

증강 현실의 재현에 상당히 영향을 줍니다.

결국 카메라 튜닝이 관건...

저작자표시

'Computer Vision' 카테고리의 다른 글

OpenCL : Using work item to partition data (5) (1)	2014.06.01
OpenCL::Query OpenCL kernel (4) (0)	2014.05.02
OpenCL Test Program (3) (0)	2014.03.31
OpenCL Test Program (2) (0)	2014.03.18
OpenCL test program (1) (0)	2014.03.15

Posted by GUNDAM_IM

Computer Vision2014. 6. 1. 06:41

OpenCL : Using work item to partition data (5)

1차원 데이터 열에 대한 커널의 실행은

clEnqueueTask

2혹은 3차원 데이터에 대한 커널의 실행은

clEnqueueNDRangeKernel

에 의해서 실행 된다.

`cl_int clEnqueueNDRangeKernel (`	cl_command_queue `command_queue`,
	cl_kernel `kernel`,
	cl_uint `work_dim`,
	const size_t `*global_work_offset`,
	const size_t `*global_work_size`,
	const size_t `*local_work_size`,
	cl_uint `num_events_in_wait_list`,
	const cl_event `*event_wait_list`,
	cl_event `*event)`

command_queue : 커멘드 큐…

kernel : Kernel

work_dim : 0 보다는 커야 하지만 3보다는 작거나 같아야 한다.

global work offset : 현재 버전에서는 그냥 0 이 되어야 한다.

global_work_size : 포인터로 정의되고

work_dim의 영향을 받는다. 따라서

global_work_size[0] * global_work_size[1] …. * global_work_size[work_dim-1]

이 전체 work_itme의 개수이다.

global work_size는 sizeof(size_t)를 초과할 수 없다.

무슨 말이냐 하면, 만약 특정 디바이스에서 size를 나타내는 최대 크기가 32이면

즉 2^32이면 이보다는 작아야 한다는 것이다.

sizeof(size_t)는 결국 메모리 공간의 포인터의 크기를 의미하는데 32보다 크면

포인터로 지정할 수 있는 공간보다 크다는

의미이므로 오류가 되는 것이다.

이런 경우에 CL_OUT_OF_RESOURCES 오류가 발생한다.

local_work_size : work_dim의 배열을 가리키는 unsigned 값이다.

work_item 숫자이고 이것은 work-group을 만든다. 그리고 커널이 실행되는 횟수이다.

global_work_size와 마찬가지로

local_work_size[0] * local_work_size[1] …. * local_work_size[work_dim-1]

으로 해서 work_item의 숫자가 결졍된다. 그리고 이 숫자는

CL_DEVICE_MAX_WORK_GROUP_SIZE보다 작아야 한다.

local_work_size[0] <= CL_DEVICE_MAX_WORK_GROUP_SIZE[0]

local_work_size[1] <= CL_DEVICE_MAX_WORK_GROUP_SIZE[1]

….

local_work_size[2] <= CL_DEVICE_MAX_WORK_GROUP_SIZE[2]

local_work_size는 global_work_item을 workgroup instance로 나누어서 할당하는것을

결정하는데 사용한다.

만약 local_work_size가 명시되어 있다면, global_work_size[0] ,

global_work_size[1] … global_work_size[work_dim-1]은

local_work_size[0] , local_work_size[1] … local_work_size[2]의 값으로 나누어

떨어져야 한다.

work-group size는 커널이 프로그램 소스에서

__attribute__((reqd_work_group_size(X,Y,Z))) 에 의해서 사용할 수 있다.

이 경우 local_work_size는 이 reqd_work_group_size에 명시된 숫자와 같아야 하낟.

귀찮으면 OpenCL이 직접 실행시에 지정할 수 있다. 이럴 경우에 NULL로 명시한다.

(주) global work size는 결국 작업해야 할 전체 work item의 개수를 의미한다.

이를 local work size로 나누어서 분할하여 작업하게 되는 것이다.

각각의 차원을 chunk로 나눌 것이다. 이때 나누어지는 chunk의 개수는 2,3,4….CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS 까지 가능하다.

이 값은 아래와 같은 방법으로 확인이 가능하다.

/* the cl_device_info are preprocessor directives defined in cl.h */
switch (param_name) {

…

case CL_DEVICE_VENDOR_ID :
case CL_DEVICE_MAX_COMPUTE_UNITS :

  case CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS : {
            cl_uint* ret = (cl_uint*) alloca(sizeof(cl_uint) * paramSize);
            error = clGetDeviceInfo( id, param_name, paramSize, ret, NULL );
            if (error != CL_SUCCESS ) {
                perror("Unable to obtain device info for param\n");
                return;
            }
            switch (param_name) {
                case CL_DEVICE_VENDOR_ID: printf("\tVENDOR ID: 0x%x\n", *ret); break;
                case CL_DEVICE_MAX_COMPUTE_UNITS: printf("\tMaximum number of parallel compute units: %d\n", *ret); break;
                case CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:

printf("\tMaximum dimensions for global/local work-item IDs: %d\n", *ret); break;
}
}break;

processing group을 병렬로 실행할 수 있고, 이 것을 work group으로 부른다. work group은 work item이라 부르는 개별적인 병렬 실행 단위를 가진다. 저자는 이 것을 executable thread라고 개념적으로 보고 있다.

위의 예에서 보면 2차원으로 구성된 공간을 9개의 work group으로 나눈다. 각각의 work group은 4x4의 work item으로 구성된다.

work item을 할당하는 데에는 2가지 방법이 있다.

하나의 work item을 하나의 work group에 할당하는 방법과

n x n 개의 work item을 (이 경우 4x4가 된다.) work group에 할당하는 방법이 있다.

우선 첫번째 방식으로 진행한다고 가정하고 진행한다.

…. 코드…

프로그램을 실행시키면 오류가 발생한다.

Number of OpenCL platforms found: 1
Number of detected OpenCL devices: 3
Running on CPU ........
=> Kernel name: copy2Dfloat4 with arity: 2
=> About to create command queue and enqueue this kernel...
Unable to enqueue task to command-queue: No such file or directory

...

어떤 원인으로 발생된지 모르므로 이를 다시 확인하기 위해서 코드를 추가하였다.

……..
void Check_clEnqueueNDRangeKernel_Error(cl_int error)
{

    switch( error )
    {
        case CL_INVALID_PROGRAM_EXECUTABLE : printf("CL_INVALID_PROGRAM_EXECUTABLE\n"); break;
        case CL_INVALID_COMMAND_QUEUE      : printf("CL_INVALID_COMMAND_QUEUE\n"); break;
        case CL_INVALID_KERNEL            : printf("CL_INVALID_KERNEL\n");break;
        case CL_INVALID_CONTEXT            : printf("CL_INVALID_CONTEXT\n");break;
        case CL_INVALID_KERNEL_ARGS        : printf("CL_INVALID_KERNEL_ARGS\n"); break;
        case CL_INVALID_WORK_DIMENSION    : printf("CL_INVALID_WORK_DIMENSION\n"); break;
        case CL_INVALID_WORK_GROUP_SIZE    : printf("CL_INVALID_WORK_GROUP_SIZE\n") ; break;
        case CL_INVALID_WORK_ITEM_SIZE    : printf("CL_INVALID_WORK_ITEM_SIZE\n"); break ;
        case CL_INVALID_GLOBAL_OFFSET      : printf("CL_INVALID_GLOBAL_OFFSET\n"); break;
        case CL_OUT_OF_RESOURCES          : printf("CL_OUT_OF_RESOURCES\n"); break;
        case CL_MEM_OBJECT_ALLOCATION_FAILURE : printf("CL_MEM_OBJECT_ALLOCATION_FAILURE\n");break;
        case CL_INVALID_EVENT_WAIT_LIST    : printf("CL_INVALID_EVENT_WAIT_LIST\n");break;
        case CL_OUT_OF_HOST_MEMORY        : printf("CL_OUT_OF_HOST_MEMORY\n");break;
    }
}

….

추가된 코드에 의해서

과 같은 것을 얻을 수 있다.

확인하기 위해서 코드를 보면

…

            /* Enqueue the kernel to the command queue */
                size_t globalThreads[2];
                globalThreads[0]=1024;
                globalThreads[1]=1024;
                size_t localThreads[2];
                localThreads[0] = 64;
                localThreads[1] = 2;

                cl_event evt;
            error = clEnqueueNDRangeKernel( cQ, // command queue
                                               kernels[j], // kernel
                                               2, // work_dim
                                               0, // global work_offset
                                               globalThreads, // global work size
                                               localThreads, // local work_size
                                               0, // event related parameter
                                               NULL, &evt);
                clWaitForEvents(1, &evt);
            if (error != CL_SUCCESS) {
                perror("Unable to enqueue task to command-queue");
                    Check_clEnqueueNDRangeKernel_Error(error);
                exit(1);
            }

…

으로 되어 있다.

계속 같은 에러가 나타나서 다음과 같이 디바이스 정보를 얻어내는 함수를 넣고 실행을 해 본다.

void displayDeviceDetails(cl_device_id id,
                          cl_device_info param_name,
                          const char* paramNameAsStr) {
    cl_int error = 0;
    size_t paramSize = 0;

    error = clGetDeviceInfo( id, param_name, 0, NULL, &paramSize );
    if (error != CL_SUCCESS ) {
        perror("Unable to obtain device info for param\n");
        return;
    }

    /* the cl_device_info are preprocessor directives defined in cl.h */
    switch (param_name) {
        case CL_DEVICE_TYPE: {
            cl_device_type* devType = (cl_device_type*) alloca(sizeof(cl_device_type) * paramSize);
            error = clGetDeviceInfo( id, param_name, paramSize, devType, NULL );
            if (error != CL_SUCCESS ) {
                perror("Unable to obtain device info for param\n");
                return;
            }
            switch (*devType) {
                case CL_DEVICE_TYPE_CPU : printf("CPU detected\n");break;
                case CL_DEVICE_TYPE_GPU : printf("GPU detected\n");break;
                case CL_DEVICE_TYPE_ACCELERATOR : printf("Accelerator detected\n");break;
                case CL_DEVICE_TYPE_DEFAULT : printf("default detected\n");break;
            }
        }break;
        case CL_DEVICE_VENDOR_ID :
        case CL_DEVICE_MAX_COMPUTE_UNITS :
        case CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS : {
            cl_uint* ret = (cl_uint*) alloca(sizeof(cl_uint) * paramSize);
            error = clGetDeviceInfo( id, param_name, paramSize, ret, NULL );
            if (error != CL_SUCCESS ) {
                perror("Unable to obtain device info for param\n");
                return;
            }
            switch (param_name) {
                case CL_DEVICE_VENDOR_ID: printf("\tVENDOR ID: 0x%x\n", *ret); break;
                case CL_DEVICE_MAX_COMPUTE_UNITS: printf("\tMaximum number of parallel compute units: %d\n", *ret); break;
                case CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: printf("\tMaximum dimensions for global/local work-item IDs: %d\n", *ret); break;
            }
        }break;
        case CL_DEVICE_MAX_WORK_ITEM_SIZES : {
            cl_uint maxWIDimensions;
            size_t* ret = (size_t*) alloca(sizeof(size_t) * paramSize);
            error = clGetDeviceInfo( id, param_name, paramSize, ret, NULL );

            error = clGetDeviceInfo( id, CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS, sizeof(cl_uint), &maxWIDimensions, NULL );
            if (error != CL_SUCCESS ) {
                perror("Unable to obtain device info for param\n");
                return;
            }
            printf("\tMaximum number of work-items in each dimension: ( ");
            for(cl_int i =0; i < maxWIDimensions; ++i ) {
                printf("%d ", ret[i]);
            }
            printf(" )\n");
        }break;
        case CL_DEVICE_MAX_WORK_GROUP_SIZE : {
            size_t* ret = (size_t*) alloca(sizeof(size_t) * paramSize);
            error = clGetDeviceInfo( id, param_name, paramSize, ret, NULL );
            if (error != CL_SUCCESS ) {
                perror("Unable to obtain device info for param\n");
                return;
            }
            printf("\tMaximum number of work-items in a work-group: %d\n", *ret);
        }break;
        case CL_DEVICE_NAME :
        case CL_DEVICE_VENDOR : {
            char data[48];
            error = clGetDeviceInfo( id, param_name, paramSize, data, NULL );
            if (error != CL_SUCCESS ) {
                perror("Unable to obtain device name/vendor info for param\n");
                return;
            }
            switch (param_name) {
                case CL_DEVICE_NAME : printf("\tDevice name is %s\n", data);break;
                case CL_DEVICE_VENDOR : printf("\tDevice vendor is %s\n", data);break;
            }
        } break;
        case CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE: {
            cl_uint* size = (cl_uint*) alloca(sizeof(cl_uint) * paramSize);
            error = clGetDeviceInfo( id, param_name, paramSize, size, NULL );
            if (error != CL_SUCCESS ) {
                perror("Unable to obtain device name/vendor info for param\n");
                return;
            }
            printf("\tDevice global cacheline size: %d bytes\n", (*size)); break;
        } break;
        case CL_DEVICE_GLOBAL_MEM_SIZE:
        case CL_DEVICE_MAX_MEM_ALLOC_SIZE: {
            cl_ulong* size = (cl_ulong*) alloca(sizeof(cl_ulong) * paramSize);
            error = clGetDeviceInfo( id, param_name, paramSize, size, NULL );
            if (error != CL_SUCCESS ) {
                perror("Unable to obtain device name/vendor info for param\n");
                return;
            }
            switch (param_name) {
                case CL_DEVICE_GLOBAL_MEM_SIZE: printf("\tDevice global mem: %ld mega-bytes\n", (*size)>>20); break;
                case CL_DEVICE_MAX_MEM_ALLOC_SIZE: printf("\tDevice max memory allocation: %ld mega-bytes\n", (*size)>>20); break;
            }
        } break;

    } //end of switch

}

void displayDeviceInfo(cl_platform_id platform_id,
                       cl_device_type dev_type) {
    /* OpenCL 1.1 device types */
    cl_int error = 0;
    cl_uint numOfDevices = 0;

    /* Determine how many devices are connected to your platform */
    error = clGetDeviceIDs(platform_id, dev_type, 0, NULL, &numOfDevices);
    if (error != CL_SUCCESS ) {
        perror("Unable to obtain any OpenCL compliant device info");
        exit(1);
    }
    cl_device_id* devices = (cl_device_id*) alloca(sizeof(cl_device_id) * numOfDevices);

    /* Load the information about your devices into the variable 'devices' */
    error = clGetDeviceIDs(platform_id, dev_type, numOfDevices, devices, NULL);
    if (error != CL_SUCCESS ) {
        perror("Unable to obtain any OpenCL compliant device info");
        exit(1);
    }
    printf("Number of detected OpenCL devices: %d\n", numOfDevices);
    /* We attempt to retrieve some information about the devices. */
    for(int i = 0; i < numOfDevices; ++ i ) {
        displayDeviceDetails( devices[i], CL_DEVICE_TYPE, "CL_DEVICE_TYPE" );
        displayDeviceDetails( devices[i], CL_DEVICE_NAME, "CL_DEVICE_NAME" );
        displayDeviceDetails( devices[i], CL_DEVICE_VENDOR, "CL_DEVICE_VENDOR" );
        displayDeviceDetails( devices[i], CL_DEVICE_VENDOR_ID, "CL_DEVICE_VENDOR_ID" );
        displayDeviceDetails( devices[i], CL_DEVICE_MAX_MEM_ALLOC_SIZE, "CL_DEVICE_MAX_MEM_ALLOC_SIZE" );
        displayDeviceDetails( devices[i], CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE, "CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE" );
        displayDeviceDetails( devices[i], CL_DEVICE_GLOBAL_MEM_SIZE, "CL_DEVICE_GLOBAL_MEM_SIZE" );
        displayDeviceDetails( devices[i], CL_DEVICE_MAX_COMPUTE_UNITS, "CL_DEVICE_MAX_COMPUTE_UNITS" );
        displayDeviceDetails( devices[i], CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS, "CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS" );
        displayDeviceDetails( devices[i], CL_DEVICE_MAX_WORK_ITEM_SIZES, "CL_DEVICE_MAX_WORK_ITEM_SIZES" );
        displayDeviceDetails( devices[i], CL_DEVICE_MAX_WORK_GROUP_SIZE, "CL_DEVICE_MAX_WORK_GROUP_SIZE" );
    }
}

결과는 아래와 같다.

Number of OpenCL platforms found: 1
Number of detected OpenCL devices: 3
Running on CPU ........
Number of detected OpenCL devices: 3
CPU detected
Device name is Intel(R) Core(TM) i7-3615QM CPU @ 2.30GHz
Device vendor is Intel
VENDOR ID: 0xffffffff
Device max memory allocation: 4096 mega-bytes
Device global cacheline size: 6291456 bytes
Device global mem: 16384 mega-bytes
Maximum number of parallel compute units: 8
Maximum dimensions for global/local work-item IDs: 3
Maximum number of work-items in each dimension: ( 1024 1 1 )
Maximum number of work-items in a work-group: 1024
GPU detected
Device name is GeForce GT 650M
Device vendor is NVIDIA
VENDOR ID: 0x1022700
Device max memory allocation: 256 mega-bytes
Device global cacheline size: 0 bytes
Device global mem: 1024 mega-bytes
Maximum number of parallel compute units: 2
Maximum dimensions for global/local work-item IDs: 3
Maximum number of work-items in each dimension: ( 1024 1024 64 )
Maximum number of work-items in a work-group: 1024
GPU detected
Device name is HD Graphics 4000
Device vendor is Intel
VENDOR ID: 0x1024400
Device max memory allocation: 256 mega-bytes
Device global cacheline size: 0 bytes
Device global mem: 1024 mega-bytes
Maximum number of parallel compute units: 16
Maximum dimensions for global/local work-item IDs: 3
Maximum number of work-items in each dimension: ( 512 512 512 )
Maximum number of work-items in a work-group: 512
numOfKernels : 1
=> Kernel name: copy2Dfloat4 with arity: 2
=> About to create command queue and enqueue this kernel…

처음에는 잘 몰랐는데 보면 work item의 차원 수와 각 차원의 갯수를 알수 있다.

우선 CPU에서 실행하게 되므로 {1024 , 1 , 1 }이 할당 될 수 있다.

앞서의 코드에서

  size_t globalThreads[2];
                globalThreads[0]=1024;
                globalThreads[1]=1024;
                size_t localThreads[2];
                localThreads[0] = 64;
                localThreads[1] = 2;

이 되어 있으므로 localThread를 2를 할당해서 그런 것이다.

수정을 해서 진행하면

넘어간다.

…

  cl_event evt;
error = clEnqueueNDRangeKernel(cQ,
                                               kernels[j],
                                               2,
                                               0,
                                               globalThreads,
                                               localThreads,
                                               0,

NULL, &evt);

clWaitForEvents(1, &evt);

…

에서 걸리는데

clWaitForEvents 을 호출하면 대기하는 동안 EXC_BAD_ACCESS 가 발생한다.

즉 잘못된 메모리를 참조 하는 것으로 발생하는 것이다.

이것은 event list의 오류로 보기 보다는 kernel에서 메모리 참조 오류로 보아야 하는게 아닌가 생각이 든다.

처음부터 찬찬히 보면은

…

#define DATA_SIZE (128*128) // for test runs, 데이터의 크기는 128 * 128개의 크기이다.

….

main()

{

...

cl_float* h_in = (float*) malloc( sizeof(cl_float4) * DATA_SIZE); // input to device
   cl_float* h_out = (float*) malloc( sizeof(cl_float4) * DATA_SIZE); // output from device
   for( int i = 0; i < DATA_SIZE; ++i) {
        h_in[i] = (float)i;
   }

h_in에 (128*128)개의 공간을 할당하고 그 메모리를 초기화 한다.

...
}

위에서 코드를 보면 cl_float와 float 그리고 cl_float4가 혼용되어 있다.

이를 아래와 같이 수정해둔다.

...

  cl_float* h_in = (float*) malloc( sizeof(cl_float) * DATA_SIZE); // input to device
    cl_float* h_out = (float*) malloc( sizeof(cl_float) * DATA_SIZE); // output from device
   for( int i = 0; i < DATA_SIZE; ++i) {
        h_in[i] = (cl_float)i;
   }

…

// Search for a CPU/GPU device through the installed platforms
// Build a OpenCL program and do not run it.
for(cl_uint i = 0; i < numOfPlatforms; i++ ) {

…..

    /* For each device, create a buffer and partition that data among the devices for compute! */
    cl_mem memInObj = clCreateBuffer(context, CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR,
                                    sizeof(cl_float4) * (DATA_SIZE), h_in, &error);
    if(error != CL_SUCCESS) {
        perror("Can't create an input buffer object");
        exit(1);
    }

…

int offset = 0;

for(int i = 0; i < numOfDevices; ++i, ++offset ) {

…

  /* Loop thru each kernel and execute on device */
            // 커널의 갯수 만큼 반복한다.
        for(cl_uint j = 0; j < numOfKernels; j++) {

…

            cl_mem memOutObj = clCreateBuffer(context, CL_MEM_WRITE_ONLY ,
                                              sizeof(cl_float4) * (DATA_SIZE), NULL, &error);

            if(error != CL_SUCCESS) {
                perror("Can't create an output buffer object");
                exit(1);
            }

            /* Let OpenCL know that the kernel is suppose to receive two arguments */
            error = clSetKernelArg(kernels[j], 0, sizeof(cl_mem), &memInObj);
            if (error != CL_SUCCESS) {
                perror("Unable to set buffer object in kernel");
                exit(1);
            }

            error = clSetKernelArg(kernels[j], 1, sizeof(cl_mem), &memOutObj);
            if (error != CL_SUCCESS) {
                perror("Unable to set buffer object in kernel");
                exit(1);
            }

…

  // 커널을 커맨드 큐에 넣는다.
            /* Enqueue the kernel to the command queue */
                size_t globalThreads[3]= { 1024 , 2 , 1 };
  size_t localThreads[3] = { 128 , 1 , 1 } ;
  cl_event evt ;//= new cl_event[1];

            error = clEnqueueNDRangeKernel(cQ,
                                               kernels[j],
                                               2,
                                               0,
                                               globalThreads,
                                               localThreads,
                                               0,
                                               NULL, &evt);

                clWaitForEvents(1, (const cl_event*)&evt);

…

            /* Enqueue the read-back from device to host */
            error = clEnqueueReadBuffer(cQ, memOutObj,
                                        CL_TRUE,               // blocking read
                                        0,         // write from the last offset
                                        (DATA_SIZE)*sizeof(cl_float4),           // how much to copy
                                        h_out, 0, NULL, NULL);

…

  /* Check the returned data */
            if ( valuesOK(h_in, h_out, DATA_SIZE) ) {
                printf("Check passed!\n");
            } else printf("Check failed!\n");

            /* Release the resources */
            clReleaseCommandQueue(cQ);
            clReleaseMemObject(memOutObj);

}

이제 실행 결과를 보면 아래와 같다.

…

=> Task has been enqueued successfully!
Checking data of size: 16384
to:0.000000, from:0.000000
to:1.000000, from:1.000000
to:2.000000, from:2.000000
to:3.000000, from:3.000000
to:1.000000, from:0.000000

위와 같이 제대로 데이터가 카피가 안된다.

OpenCL 함수는 아래와 같이 코딩되어 있다.

#define WIDTH 128
#define DATA_TYPE float4

/*
The following macros are convenience 'functions'
for striding across a 2-D array of coordinates (x,y)
by a factor which happens to be the width of the block
i.e. WIDTH
*/
#define A(x,y) A[(x)* WIDTH + (y)]
#define C(x,y) C[(x)* WIDTH + (y)]
__kernel void copy2Dfloat4(__global DATA_TYPE *A, __global DATA_TYPE *C)
{
    int x = get_global_id(0);
    int y = get_global_id(1);
    // its like a vector load/store of 4 elements
    C(x,y) = A(x,y);

}

global id를 두가지를 받아 와서 동작하돌고 만들어져 있다.

Kernel과 work item에 대해서 관계를 정리하면 아래와 같다.

커널은 하나의 워크 아이템에 대해서 한번만 실행이 된다.

각각의 워크 아이템은 하나의 private 메모리 공간을 가지고 있다.

워크 아이템을 묶어서 work group으로 나타내어 진다.

워크 그룹은 local memory를 가진다.

전체 워크 아이템의 갯수는 global work size로 명시된다.

global과 constant 메모리는 모든 워크 아이템과 워크 그룹 사이에서 공유된다

위의 그림에서

하나의 사각형은 워크 아이템을 나타낸다.

그리고 그룹핑된 박스는 워크 그룹을 나타낸다.

OpenCL은 Dimension이라는 개념을 기반으로 움직인다. 이것은 work item을 인덱스를 이용해서 선언할 수 있다는 의미이다.

위의 그림에서 work group의 크기는 Sx = 4 , Sy = 4 가 된다.

얼마나 큰 dimension을 사용 할 지는 전적으로 프로그래머에게 달려 있다.

그러나 전체적으로 물리적인 한계 때문에 그룹당 워크 아이템이라던가 global index등의 한계는 존재하게 된다.

커널의 내에서는 핸재의 커널이 할당된 work item의 위치를 물어 볼 수 있다.

get_global_id(dim)

으로 알아 낼 수 있다.

위의 함수는

get_local_size(dim)*get_group_id(dim) + get_local_id(dim).

과 동일한 결과를 가지고 온디ㅏ.

get_local_size(dim) : group size

get_group_id(dim) : group position in dim

get_local_id(dim) : he position of a work item relative to the group

그림으로 나타내면 아래와 같다.

연산을 어떻게 할 것인가를 나타내기 위해서는

현재의 위치를 알고 현재의 위치에 맞는 크기를 가지고 와야 한다.

global size는 global thread로 코드에 표시가 되는데

1024개의 글로벌 Size가 있다면

128 * 128 / 1024 = 16 이 되어야 한다.

local 은 global을 local단위로 잘라내는 역활을 한다.

            /* Enqueue the kernel to the command queue */
                size_t globalThreads[3]= { 256 , 64 , 1 };
//                globalThreads[0]=1024;
//                globalThreads[1]=1024;

                size_t localThreads[3] = { 128 , 1 , 1 } ;
//                localThreads[0] = 128;
//                localThreads[1] = 128;

위와 같이 되어 있다면

256 x 64의 global size를 가지게 된다.

이것을 128 x 1으로 나누어서 한번에 계산하게 되는 것이다.

이때 Kernel은 현재의 위치를 알기 위해서는

global_id(0)와 global_id(1)을 확인해야 한다.

OpenCL에서

#define A(x,y) A[(x)* WIDTH + (y)]

#define C(x,y) C[(x)* WIDTH + (y)]

int x = get_global_id(0);
int y = get_global_id(1);

으로 되어 있다.

따라서 global thread가 0/1/2가 있는데 0은 OpenCL에서 Index(0)를 1은 Index(1)을 반영한다.(고 가정할 수 있다.)

그렇게 되면 위에서 처럼 256 x 64 이면 16384가 되므로 (1024,1,1)을 초과하여서 안되고

일단 1024,1,1로만 한정하여서 다시 해본다.

size_t globalThreads[3]= { 1024 , 1 , 1 };

로 지정하면 global thread의 개수가 1024개까지 실행이 된다고 볼 수 있다.

여기서 local thread는 global thread를 나누어서 실행하는 것을 의미한다.

여기서는 128,1,1로 되어 있다면

1024를 8개 Work Group으로 나누어서 실행한다고 볼 수 있다.

즉 한번에 128개를 동시에 실행하는 것이다.

여기까지는 문제가 없다.

OpenCL Code를 보면

#define A(x,y) A[(x)* WIDTH + (y)]
#define C(x,y) C[(x)* WIDTH + (y)]
__kernel void copy2Dfloat4(__global DATA_TYPE *A, __global DATA_TYPE *C)
{
    int x = get_global_id(0);
    int y = get_global_id(1);
    //int y = 0;
    // its like a vector load/store of 4 elements
    C(x,y) = A(x,y);

}

…

위와 같은데

위치를 계산하는 x와 y를 보면

x * width + y의 값을 가진다.

그런데 x가 index 0를 y가 index 1을 가지고 오고 있다.

x축과 y축으로 본다면 index 0와 index 1이 바뀐 것으로 볼 수 있다.

따라서 다음과 같이 한다.

int x = get_global_id(1);
int y = get_global_id(0);

…

으로 하고 컴파일하면

…

from:1019.000000, to:1019.000000
from:1020.000000, to:1020.000000
from:1021.000000, to:1021.000000
from:1022.000000, to:1022.000000
from:1023.000000, to:1023.000000
Check passed!
Program ended with exit code: 0

으로 테스트가 통과하고 있다.

결국 OpenCL에서 Index의 선택이 잘 못되어 있었다.

그런데

local thread를

size_t localThreads[3] = { 512 , 1 , 1 } ; —> FAIL

size_t localThreads[3] = { 256 , 1 , 1 } ; —> FAIL

size_t localThreads[3] = { 128 , 1 , 1 } ; —> WORK

로 된다.

Local Thread의 갯수가 128일때 동작을 한다.

이런 것을 일일이 찾아서 만드는 것이 귀찮으면, 그리고 플랫폼에 따른 커널의 일관성을 유지하기가 쉽지 않을 것 같으므로

Local Thread를 NULL로 해두면 OpenCL Compiler가 알아서 할당하게 된다.

아래 코드로 해도 정상적으로 실행이 된다.

            error = clEnqueueNDRangeKernel(cQ,
                                               kernels[j],
                                               2,
                                               0,
                                               globalThreads,
                                               NULL,
                                               0,
                                               NULL, &evt);

…

테스트를 위해서 다음과 같이 크기를 변경하였다.

#define DATA_SIZE (128*128) // for test runs,

그랬더니 다음과 같은 오류가 발생하였다.

...

=> Task has been enqueued successfully!
Checking data of size: 16384
from:4096.000000, to:0.000000
from:4097.000000, to:0.000000
from:4098.000000, to:0.000000
from:4099.000000, to:0.000000
Check failed!
Program ended with exit code: 0

…

4096에서 오류가 발생한 것이다.

복사가 안된 것이다.

…

size_t globalThreads[3]= { 512 , 1 , 1 };

로 바꾼뒤에 다시 실행하면

=> Task has been enqueued successfully!
Checking data of size: 16384
from:2048.000000, to:0.000000
from:2049.000000, to:0.000000
from:2050.000000, to:0.000000
from:2051.000000, to:0.000000
Check failed!

으로 2048에서 오류가 발생한다.

마찬가지로

...

size_t globalThreads[3]= { 256 , 1 , 1 };

...

=> Task has been enqueued successfully!
Checking data of size: 16384
from:1024.000000, to:0.000000
from:1025.000000, to:0.000000
from:1026.000000, to:0.000000
from:1027.000000, to:0.000000

…

1024일때 4096

512일 경우 2048

256 일 경우 1024

이후에 오류가 발생한다.

즉 Global Size * 4에서 부터 오류가 발생하는 것이다.

체크를 하기 위햇 더 프린트를 해 보면 아래와 같다.

Checking data of size: 16384
from:1024.000000, to:0.000000
from:1025.000000, to:0.000000
from:1026.000000, to:0.000000
from:1027.000000, to:0.000000
from:0.000000, to:0.000000
from:0.000000, to:0.000000
Check failed!

…

값이 1027 이후에는 아예 없어졌다.

현상은 두가지 이다. 하나는 1023까지는 정확하게 가는데

1024부터는 안가는 현상이고

두번째는 1027 이후 1028부터는 아예 데이터가 없다는 점이다.

원래 부터 없었는지 확인을 해보기 위해서 아래와 같은 코드를 넣어 보자

  printf("Before start check data \n");
                printf(" from:%f, to:%f\n", h_in[1024+0] ,h_out[1024+0]);
                printf(" from:%f, to:%f\n", h_in[1024+1] ,h_out[1024+1]);
                printf(" from:%f, to:%f\n", h_in[1024+2] ,h_out[1024+2]);
                printf(" from:%f, to:%f\n", h_in[1024+3] ,h_out[1024+3]);
                printf(" from:%f, to:%f\n", h_in[1024+4] ,h_out[1024+4]);

결과는 아래와 같다.

…

Before start check data

from:1024.000000, to:0.000000
from:1025.000000, to:0.000000
from:1026.000000, to:0.000000
from:1027.000000, to:0.000000
from:0.000000, to:0.000000

…

우선 1028부터는 아예 값이 안들어간 문제부터 확인해보자

for( int i = 0; i < DATA_SIZE; ++i) {
…

}

위의 코드를 보면

데이터 크기까지만 카피 하도록 되어 있다.

그런데 데이터 크기의 타입이 float4이므로 실제로는 그 4배가 되어야 한다.

  for( int i = 0; i < DATA_SIZE*4; ++i) {
        h_in[i] = (float)(i);
    }

위와 같이 수정하여서 실행하면 다음과 같은 결과를 얻는다.

…

Checking data of size: 16384
from:1024.000000, to:0.000000
from:1025.000000, to:0.000000
from:1026.000000, to:0.000000
from:1027.000000, to:0.000000
from:1028.000000, to:0.000000
from:1029.000000, to:0.000000
Check failed!
Program ended with exit code: 0

...

이제 문제중 두번째 문제는 해결 되었다.

첫번째 문제인 1024 이후에는 복사를 하지 않는 문제를 해결 해 보자

아까 말한대로 global thread의 4배까지만 복사해주는 문제가 있다.

이는 Enqueue하는 것이 global thread까지만 한다고 가정 할 수 있지 않을까 ?

그런데 여기서 전체적으로 작업해야 할 크기가 1024이고 float4이므로 실제로는 4096까지 간다는 것은 이해 할 수 있다.

한번 이것을 늘려 보자

그러면

size_t globalThreads[3]= { (128 * 128 / 4 ) , 1 , 1 };

…

으로 하면

…

Running GPU ........
numOfKernels : 1
=> Kernel name: copy2Dfloat4 with arity: 2
=> About to create command queue and enqueue this kernel...
[0000] Before start check data
from:1024.000000, to:1024.000000
from:1025.000000, to:1025.000000
from:1026.000000, to:1026.000000
from:1027.000000, to:1027.000000
from:1028.000000, to:1028.000000
=> Task has been enqueued successfully!
Checking data of size: 16384
Check passed!

정확하게 종료 하는 것을 알 수 있다.

따라서 global thread 값은 실제 수행해야 하는 work item의 갯수를 의미한다.

이를 local thread로 나누어서 수행하는 것이다.

이제 일반화 시키기 위해서 다음과 같이 한다.

size_t globalThreads[3]= { (DATA_SIZE / 4 ) , 1 , 1 };

...

그래도 결과는 성공적으로 끝난다.

이번에는 2차원으로 해보자

size_t globalThreads[3]= { (DATA_SIZE / 4 ) >> 1 , 1 << 1 , 1 };
..

from:12288.000000, to:0.000000
from:12289.000000, to:0.000000
from:12290.000000, to:0.000000
from:12291.000000, to:0.000000
from:12292.000000, to:0.000000
from:12293.000000, to:0.000000
Check failed!

위에서 오류가 발생하였음을 알 수 있다.

size_t globalThreads[3]= { (DATA_SIZE / 4 ) >> 1 , 1 << 1 , 1 };

에 대한 OpenCL에서 Code를 반영하지 않아서 생긴 문제이다.

...

#define WIDTH (16384/8)

…

으로 하면 정상적으로 실행이 된다.

size_t globalThreads[3]= { (DATA_SIZE / 4 ) >> 2 , 1 << 2 , 1 };

#define WIDTH (16384/16)

위와 같이 되어도 정상적으로 동작을 한다.

정리하면

size_t globalThreads[3]= { (DATA_SIZE ) , 1 , 1 };

#define WIDTH (16384)

size_t globalThreads[3]= { (DATA_SIZE /2 ) , 1 , 1 };

#define WIDTH (16384/2)

size_t globalThreads[3]= { (DATA_SIZE / 4 ) , 1 , 1 };

#define WIDTH (16384/4)

size_t globalThreads[3]= { (DATA_SIZE / 8 ) , 2 , 1 };

#define WIDTH (16384/8)

size_t globalThreads[3]= { (DATA_SIZE / 16 ) , 4 , 1 };

#define WIDTH (16384/16)

가 정상적으로 동작하는 케이스이다.

2차원의 값이 증가하는 경우 숫자가 증가하므로 1차원의 숫자가 /2가 된다.

개념은 위와 같은 모델이 된다.

잘 생각해보면 당연한 이야기 이다.

저작자표시

'Computer Vision' 카테고리의 다른 글

증강 현실 테스트 (0)	2014.10.04
OpenCL::Query OpenCL kernel (4) (0)	2014.05.02
OpenCL Test Program (3) (0)	2014.03.31
OpenCL Test Program (2) (0)	2014.03.18
OpenCL test program (1) (0)	2014.03.15

Posted by GUNDAM_IM

Computer Vision2014. 5. 2. 13:16

OpenCL::Query OpenCL kernel (4)

이 프로그램은 OpenCL 코드를 불러들여서 컴파일하여서 kernel을 만드는데 까지 진행한다.

실행은 아직이다.

우선 크게 보면 main program에서 platform id를 확인하고

platform id로 context를 만든다.

만들어진 context에 OpenCL 프로그램을 불러들여서 build를 하고 kernel로 준비해 준다.

그리고 해당 kernel의 정보를 받아와서 프린트 해 주는 데까지 진행한다.

OpenCL은 Runtime compile을 한다. 따라서 미리미리 컴파일을 해 고 진행하여야 한다.

이 예제에서는 Program과 Kernel을 만드는데 까지 진행한 예제이다.

(1) Platform ID의 개수를 확인한다.

(2) 확인된 Platform ID의 개수 만큼 공간을 확보한다.

(3) 확보된 메모리 공간에 Platform ID를 받아온다.

(4) Platform ID개수만큼

1) Device ID중 GPU 인것을 받아온다.

2) GPU인 것이 없으면 CPU 인 것을 받아온다.

3) Device ID에 Context를 생성한다.

4) OpenCL 프로그램 소스를 로 드

5) 로드된 프로그램소스를 빌드하여서 프로그램 화 한다.

이 단계는 CreateProgramWithSource -> BuildProgram 을 거치는 2단계로 구성된다.

6) 빌드 (컴파일)에서 문제가 생기면 그 정보(LOG)를 받아서 프린트 해준다.

(5) clCreateKernelsInProgram 프로그램을 가지고 커널을 만든다.

마찬가지로 2단계로 한다.

우선 커널의 개수를 확인하고

확인된 커널의 개수만큼 메모리를 할당받고

해당 할당받은 메모리에다가 커널을 빌드한다.

(6) 확인된 커널의 개수만큼

1) 커널의 이름을 받아오고

2) 인자의 개수를 받아온다.

3) 받아온 커널의 이름과 인자의 개수를 프린트한다.

복잡하지만 줄여보면 다음과 같다.

기본 프레임은

(1) 필요한 요소의 개수를 물어보고

(2) 그만큼 메모리 공간을 할당 받고

(3) 할당받은 메모리 공간에 요소를 받아온다.

이다.

이것을

Platform ID -> Device ID -> context -> Kernel

순으로 진행한다.

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <alloca.h>

#ifdef __APPLE__
#include <OpenCL/cl.h>
#else
#include <CL/cl.h>
#endif

/*
프로그램을 읽어 들인다.
OpenCL은 On-The-Fly Compile을 수행한다.
*/
void loadProgramSource(const char** files,
                       size_t length,
                       char** buffer,
                       size_t* sizes) {

    /* Read each source file (*.cl) and store the contents into a temporary datastore */
    for(size_t i=0; i < length; i++) {
        FILE* file = fopen(files[i], "r");
        if(file == NULL) {
            perror("Couldn't read the program file");
            exit(1);
        }

        /* 파일의 끝까지 진행하여서 크기를 확인하고 다시 처음으로 돌아온다.*/
        fseek(file, 0, SEEK_END);
        sizes[i] = ftell(file);
        rewind(file); // reset the file pointer so that 'fread' reads from the front

        /* 크기 +1 만큼 할당한다음 */
        buffer[i] = (char*)malloc(sizes[i]+1);
        buffer[i][sizes[i]] =  '\0';

        /* 한방에 읽어 들이다. */
        fread(buffer[i], sizeof(char), sizes[i], file);
        fclose(file);
    }
}

int main(int argc, char** argv) {

    /* OpenCL 1.1 data structures */
    cl_platform_id* platforms;
    cl_program program;
    cl_device_id device;
    cl_context context;

    /* OpenCL 1.1 scalar data types */
    cl_uint numOfPlatforms;
    cl_int error;

    /*
     Get the number of platforms
     Remember that for each vendor's SDK installed on the computer,
     the number of available platform also increased.
     */

/* Step 1 : Platform ID의 개수를 확인한다. */
    error = clGetPlatformIDs(0, NULL, &numOfPlatforms);
    if(error != CL_SUCCESS) {
        perror("Unable to find any OpenCL platforms");
        exit(1);
    }

/* Step 2 : 확인된 Platform ID의 개수만큼 메모리 공간을 확보한다. */
platforms = (cl_platform_id*) alloca(sizeof(cl_platform_id) * numOfPlatforms);
    printf("Number of OpenCL platforms found: %d\n", numOfPlatforms);

/* Step 3 : 확보된 메모리 공간에 Platform ID 데이터를 받아온다. */
error = clGetPlatformIDs(numOfPlatforms, platforms, NULL);
    if(error != CL_SUCCESS) {
        perror("Unable to find any OpenCL platforms");
        exit(1);
    }

    // GPU를 찾는다.
    // Search for a CPU/GPU device through the installed platforms
    // Build a OpenCL program and do not run it.

// Platform의 개수 만큼 진행한다.
for(cl_uint i = 0; i < numOfPlatforms; i++ ) {

// Get the GPU device

// Step 1 : Device ID 중에서 GPU 인 것을 확인한다.
        error = clGetDeviceIDs(platforms[i], CL_DEVICE_TYPE_GPU, 1, &device, NULL);
        if(error != CL_SUCCESS) {
            // Otherwise, get the CPU
// Step 1-1 : Device ID 중 GPU가 없으면 CPU인 것을 확인한다.
error = clGetDeviceIDs(platforms[i], CL_DEVICE_TYPE_CPU, 1, &device, NULL);
        }
        if(error != CL_SUCCESS) {
            perror("Can't locate any OpenCL compliant device");
            exit(1);
        }

        /* Context를 생성한다. */
// Step 2 : Device ID에 Context를 생성한다.
context = clCreateContext(NULL, 1, &device, NULL, NULL, &error);
        if(error != CL_SUCCESS) {
            perror("Can't create a valid OpenCL context");
            exit(1);
        }

        /* Load the two source files into temporary datastores */
        const char *file_names[] = {"/Users/freegear/Desktop/OpenCL/cl_test1/KernelQuery/KernelQuery/simple.cl", "/Users/freegear/Desktop/OpenCL/cl_test1/KernelQuery/KernelQuery/simple_2.cl"};
        const int NUMBER_OF_FILES = 2;
        char* buffer[NUMBER_OF_FILES];
        size_t sizes[NUMBER_OF_FILES];
        loadProgramSource(file_names, NUMBER_OF_FILES, buffer, sizes);

        /* OpenCL program object를 생성한다.*/

// Step 3 : Context에 프로그램 source를 컴파일하여 program을 생성한다.
        program = clCreateProgramWithSource(context, NUMBER_OF_FILES, (const char**)buffer, sizes, &error);
        if(error != CL_SUCCESS) {
            perror("Can't create the OpenCL program object");
            exit(1);
        }

        /* Build OpenCL program object and dump the error message, if any */
        char *program_log;
        const char options[] = "-cl-finite-math-only -cl-no-signed-zeros";
        size_t log_size;
        //error = clBuildProgram(program, 1, &device, argv[1], NULL, NULL);
        // Uncomment the line below, comment the line above; re-build the program to use build options statically
// Step 4: 생성된 프로그램을 빌드한다.
error = clBuildProgram(program, 1, &device, options, NULL, NULL);

        if(error != CL_SUCCESS) {
            // If there's an error whilst building the program, dump the log
            clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG, 0, NULL, &log_size);
            program_log = (char*) malloc(log_size+1);
            program_log[log_size] = '\0';

// Step 4-1: 빌드 정보에 대한 Log를 확인한다.
  clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG,
                                  log_size+1, program_log, NULL);

            printf("\n=== ERROR ===\n\n%s\n=============\n", program_log);
            free(program_log);
            exit(1);
        }

        /* Query the program as to how many kernels were detected */
        cl_uint numOfKernels;

// Step 5 : Kernel의 개수를 학인한다.
        error = clCreateKernelsInProgram(program, 0, NULL, &numOfKernels);
        if (error != CL_SUCCESS) {
            perror("Unable to retrieve kernel count from program");
            exit(1);
        }

// Step 6 : 확인된 Kernel의 개수만큼 메모리를 할당한다.
        cl_kernel* kernels = (cl_kernel*) alloca(sizeof(cl_kernel) * numOfKernels);
        error = clCreateKernelsInProgram(program, numOfKernels, kernels, NULL);

// Step 7 : 확인된 Kernel의 개수만큼 반복한다.
  for(cl_uint i = 0; i < numOfKernels; i++) {
            char kernelName[32];
            cl_uint argCnt;
// Step 7-1 : 확인된 Kernel의 이름을 받아온다.
  clGetKernelInfo(kernels[i], CL_KERNEL_FUNCTION_NAME, sizeof(kernelName), kernelName, NULL);
// Step 7-2 : 확인된 Kernel 인자의 개수를 받아온다.
  clGetKernelInfo(kernels[i], CL_KERNEL_NUM_ARGS, sizeof(argCnt), &argCnt, NULL);
            printf("Kernel name: %s with arity: %d\n", kernelName, argCnt);
        }

/* Clean up */

// Step 8 : 청소하기
  for(cl_uint i = 0; i < numOfKernels; i++) { clReleaseKernel(kernels[i]); }
        for(i=0; i< NUMBER_OF_FILES; i++) { free(buffer[i]); }
        clReleaseProgram(program);
        clReleaseContext(context);
    }
}

실행 결과는 아래와 같다.

Number of OpenCL platforms found: 1

Kernel name: simpleAdd with arity: 3

Kernel name: simpleAdd_2 with arity: 3

Program ended with exit code: 0

찾아낸 플랫폼의 개수는 1개이고 여기서 포함되는 커널의 개수는 2개이다.

각각

simpleAdd

simpleAdd_2

가 된다.

OpenCL source 코드는 아래와 같이 되어 있고 여기서는 아직 중요한 포인트가 아니므로 그냥 넘어간다.

simple.cl

__kernel void simpleAdd(__global float *a,

__global float *b,

__global float *c) {

*c = *a + *b;

}

simple_2.cl

__kernel void simpleAdd_2(__global float *a,

__global float *b,

__global float *c) {

*c = *a + *b;

}

저작자표시

'Computer Vision' 카테고리의 다른 글

증강 현실 테스트 (0)	2014.10.04
OpenCL : Using work item to partition data (5) (1)	2014.06.01
OpenCL Test Program (3) (0)	2014.03.31
OpenCL Test Program (2) (0)	2014.03.18
OpenCL test program (1) (0)	2014.03.15

Posted by GUNDAM_IM

Computer Vision2014. 3. 31. 19:28

OpenCL Test Program (3)

디바이스 별로 고유한 확장 기능을 부여할 수 있다.

이런 기능이 추가되어 있는지 확인을 한다.

추가되는 함수는 아래에 표시되어 있다.

void displayDeviceInfo( cl_platform_id id ,

cl_device_type dev_type)

......

displayDeviceDetails( devices[i], CL_DEVICE_EXTENSIONS, "CL_DEVICE_EXTENSIONS" );

.....

void displayDeviceDetails( cl_device_id id,

cl_device_info param_name,

const char * paramNameAsStr)

...

switch(param_name)

{

......

break;

case CL_DEVICE_EXTENSIONS : {

// beware of buffer overflow; alternatively use the OpenCL C++ bindings

char* extension_info[4096];

error = clGetDeviceInfo( id, CL_DEVICE_EXTENSIONS, sizeof(extension_info),

extension_info, NULL);

printf("\tSupported extensions: %s\n", extension_info);

}break;

}

....

확장에 대한 정보를 확인하기 위해서 CL_DEVICE_EXTENSONS를 확인하면 된다.

세부 정보에서 해당 정보를 프린트하도록 한다.

완성된 코드와 실행 결과는 아래와 같다.

// main.cpp

// TestOpenCL

// Created by freegear on 2014. 2. 8..

#include <iostream>

#include <stdio.h>

#include <stdlib.h>

#ifdef __APPLE__

# include <OpenCL/opencl.h>

#else

# include <CL/cl.h>

#endif

//Function proto type

void displayDeviceDetails(cl_device_id id ,

cl_device_info param_name,

const char * paramNameAsStr);

void displayDeviceInfo( cl_platform_id id ,

cl_device_type dev_type)

{

/* OpenCL 1.1 device type */

cl_int error = 0 ;

cl_uint numOfDevices = 0 ;

/* Determine how many devices are connected to your platform */

error = clGetDeviceIDs(id, dev_type, 0, NULL, &numOfDevices);

if( error != CL_SUCCESS){

perror("Unable to obtain any OpenCL compliant device info");

exit(1);

}

cl_device_id * devices = (cl_device_id*)alloca(sizeof(cl_device_id) * numOfDevices);

/* Load the information about your devices into the variable 'devices' */

error = clGetDeviceIDs(id, dev_type, numOfDevices, devices, NULL);

if (error != CL_SUCCESS)

{

perror("Unable to obtain any OpenCL compliant device info");

exit(1);

}

printf("Numnber of detected OpenCL devices %d\n", numOfDevices);

/* we attempt to retrieve some information about the devices */

for( int i = 0 ; i < numOfDevices ; ++i)

{

displayDeviceDetails(devices[i], CL_DEVICE_TYPE, "CL_DEVICE_TYPE");

displayDeviceDetails(devices[i], CL_DEVICE_VENDOR_ID, "CL_DEVICE_VENDOR_ID");

displayDeviceDetails(devices[i], CL_DEVICE_MAX_COMPUTE_UNITS, "CL_DEVICE_MAX_COMPUTE_UNITS");

displayDeviceDetails(devices[i], CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS, "CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS");

displayDeviceDetails(devices[i], CL_DEVICE_MAX_WORK_ITEM_SIZES, "CL_DEVICE_MAX_WORK_ITEM_SIZES");

displayDeviceDetails(devices[i], CL_DEVICE_MAX_WORK_GROUP_SIZE, "CL_DEVICE_MAX_WORK_GROUP_SIZE");

displayDeviceDetails( devices[i], CL_DEVICE_EXTENSIONS, "CL_DEVICE_EXTENSIONS" );

}

void displayDeviceDetails( cl_device_id id,

cl_device_info param_name,

const char * paramNameAsStr)

{

cl_int error = 0 ;

size_t paramSize = 0 ;

error = clGetDeviceInfo( id ,param_name, 0 , NULL, &paramSize);

if(error != CL_SUCCESS)

{

perror("Unable to obtain device info for param");

return;

}

The cl_device_info are preporcessor directives define in cl.h

switch(param_name)

{

case CL_DEVICE_TYPE :

{

cl_device_type * devType = (cl_device_type*)alloca(sizeof(cl_device_type)*paramSize);

error = clGetDeviceInfo(id, param_name, paramSize, devType, NULL);

if(error != CL_SUCCESS)

{

perror("Unable to optain device info for param\n");

return;

}

switch( *devType)

{

case CL_DEVICE_TYPE_CPU : printf("CPU Detected \n"); break;

case CL_DEVICE_TYPE_GPU : printf("GPU Detected \n"); break;

case CL_DEVICE_TYPE_ACCELERATOR : printf("Accelerator detected\n");break;

case CL_DEVICE_TYPE_DEFAULT : printf("default detected \n"); break;

}

break;

}

case CL_DEVICE_VENDOR_ID :

case CL_DEVICE_MAX_COMPUTE_UNITS :

case CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS : {

cl_uint* ret = (cl_uint*) alloca(sizeof(cl_uint) * paramSize);

error = clGetDeviceInfo( id, param_name, paramSize, ret, NULL );

if (error != CL_SUCCESS ) {

perror("Unable to obtain device info for param\n");

return;

}

switch (param_name) {

case CL_DEVICE_VENDOR_ID: printf("\tVENDOR ID: 0x%x\n", *ret); break;

case CL_DEVICE_MAX_COMPUTE_UNITS: printf("\tMaximum number of parallel compute units: %d\n", *ret); break;

case CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: printf("\tMaximum dimensions for global/local work-item IDs: %d\n", *ret); break;

}

break;

case CL_DEVICE_MAX_WORK_ITEM_SIZES :

{

cl_uint maxWIDimensions;

size_t* ret = (size_t*) alloca(sizeof(size_t) * paramSize);

error = clGetDeviceInfo( id, param_name, paramSize, ret, NULL );

error = clGetDeviceInfo( id, CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS, sizeof(cl_uint), &maxWIDimensions, NULL );

if (error != CL_SUCCESS ) {

perror("Unable to obtain device info for param\n");

return;

}

printf("\tMaximum number of work-items in each dimension: ( ");

for(cl_int i =0; i < maxWIDimensions; ++i ) {

printf("%d ", (int)ret[i]);

}

printf(" )\n");

}break;

case CL_DEVICE_MAX_WORK_GROUP_SIZE :

{

size_t* ret = (size_t*) alloca(sizeof(size_t) * paramSize);

error = clGetDeviceInfo( id, param_name, paramSize, ret, NULL );

if (error != CL_SUCCESS ) {

perror("Unable to obtain device info for param\n");

return;

}

printf("\tMaximum number of work-items in a work-group: %d\n", (int)*ret);

}

break;

case CL_DEVICE_NAME :

case CL_DEVICE_VENDOR :

{

char data[48];

error = clGetDeviceInfo( id, param_name, paramSize, data, NULL );

if (error != CL_SUCCESS ) {

perror("Unable to obtain device name/vendor info for param\n");

return;

}

switch (param_name) {

case CL_DEVICE_NAME : printf("\tDevice name is %s\n", data);break;

case CL_DEVICE_VENDOR : printf("\tDevice vendor is %s\n", data);break;

}

break;

case CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE: {

cl_uint* size = (cl_uint*) alloca(sizeof(cl_uint) * paramSize);

error = clGetDeviceInfo( id, param_name, paramSize, size, NULL );

if (error != CL_SUCCESS ) {

perror("Unable to obtain device name/vendor info for param\n");

return;

}

printf("\tDevice global cacheline size: %d bytes\n", (*size)); break;

} break;

case CL_DEVICE_GLOBAL_MEM_SIZE:

case CL_DEVICE_MAX_MEM_ALLOC_SIZE:

{

cl_ulong* size = (cl_ulong*) alloca(sizeof(cl_ulong) * paramSize);

error = clGetDeviceInfo( id, param_name, paramSize, size, NULL );

if (error != CL_SUCCESS ) {

perror("Unable to obtain device name/vendor info for param\n");

return;

}

switch (param_name) {

case CL_DEVICE_GLOBAL_MEM_SIZE: printf("\tDevice global mem: %ld mega-bytes\n", (*size)>>20); break;

case CL_DEVICE_MAX_MEM_ALLOC_SIZE: printf("\tDevice max memory allocation: %ld mega-bytes\n", (*size)>>20); break;

}

break;

case CL_DEVICE_EXTENSIONS : {

// beware of buffer overflow; alternatively use the OpenCL C++ bindings

char* extension_info[4096];

error = clGetDeviceInfo( id, CL_DEVICE_EXTENSIONS, sizeof(extension_info), extension_info, NULL);

printf("\tSupported extensions: %s\n", extension_info);

}break;

}//end of switch(param_name

}// end of displayDeviceDetailes

void displayPlatformInfo(

cl_platform_id id ,

cl_platform_info param_name,

const char * paramNameAsStr

)

{

cl_int error = 0 ;

size_t paramSize = 0 ;

error = clGetPlatformInfo(

id,

param_name,

NULL,

&paramSize

);

char * moreInfo = (char *) alloca(sizeof(char)*paramSize);

error = clGetPlatformInfo(

id, /* The platform ID returned by clGetPlatformIDs or can be NULL.

If platform is NULL, the behavior is implementation-defined.*/

param_name, /* An enumeration constant that identifies the platform

information being queried. It can be one of the following values

as specified in the table below. */

paramSize, /* Specifies the size in bytes of memory pointed to by param_value.

This size in bytes must be greater than or equal to size of return type

specified in the table below. */

moreInfo, /* A pointer to memory location where appropriate values for a given

param_value will be returned. Acceptable param_value values are listed in

the table below. If param_value is NULL, it is ignored.*/

NULL /* Returns the actual size in bytes of data being queried by param_value.

If param_value_size_ret is NULL, it is ignored */

);

if( error != CL_SUCCESS)

{

if ( error == CL_INVALID_PLATFORM)

perror("Error CL_INVALID_PLATFORM");

else if (error == CL_INVALID_VALUE)

perror("Error CL_INVALID_VALUE");

else

perror("Unable to find any OpenCL platform information");

return ;

}

printf("%s : %s\n" , paramNameAsStr, moreInfo);

} // end of displayPlatformInfo

int main(int argc, const char * argv[])

{

// OpenCL 1.2 Data struction

cl_platform_id * platforms ;

/* OpenCL 1.1 scalar data types */

cl_uint numOfPlatforms ;

cl_int error ;

Get the number of platforms

Remember that for each vendor's SDK installed on the computer,

the number of available platform also increased.

error = clGetPlatformIDs(0, /*

The number of cl_platform_id entries that can be added to platforms.

If platforms is not NULL, the num_entries must be greater than zero.

NULL, /* Returns a list of OpenCL platforms found.

The cl_platform_id values returned in platforms can be used to identify

a specific OpenCL platform. If platforms argument is NULL,

this argument is ignored.

The number of OpenCL platforms returned is the mininum of the value

specified by num_entries or the number of OpenCL platforms available. */

&numOfPlatforms

/* Returns the number of OpenCL platforms available.

If num_platforms is NULL, this argument is ignored. */

);

if ( error < 0)

{

perror("Unable to find any OpenCL platforms");

exit(1);

}

// allocate memory for the number of installed platforms

// alloca(....) occupies some stack space but is

// automatically freed on return

platforms = (cl_platform_id*) alloca(sizeof(cl_platform_id)*numOfPlatforms);

printf("Number of OpenCL platforms found : %d\n", numOfPlatforms);

error = clGetPlatformIDs(numOfPlatforms, /*

The number of cl_platform_id entries that can be added to platforms.

If platforms is not NULL, the num_entries must be greater than zero.

platforms, /* Returns a list of OpenCL platforms found.

The cl_platform_id values returned in platforms can be used to identify

a specific OpenCL platform. If platforms argument is NULL,

this argument is ignored.

The number of OpenCL platforms returned is the mininum of the value

specified by num_entries or the number of OpenCL platforms available. */

NULL

/* Returns the number of OpenCL platforms available.

If num_platforms is NULL, this argument is ignored. */

);

if ( error < 0)

{

perror("Unable to find any OpenCL platforms");

exit(1);

}

// We invoke the API 'clPlatformInfo' twice for each

// parameter we are trying to extract

// and we use the return value to create temporary data

// structure (on the stack) to store

// the returned information ot the second invocation.

for( cl_uint i = 0 ; i < numOfPlatforms ; ++i)

{

displayPlatformInfo( platforms[i], CL_PLATFORM_PROFILE , "CL_PLATFORM_PROFILE");

displayPlatformInfo( platforms[i], CL_PLATFORM_VERSION , "CL_PLATFORM_VERSION");

displayPlatformInfo( platforms[i], CL_PLATFORM_NAME , "CL_PLATFORM_NAME");

displayPlatformInfo( platforms[i], CL_PLATFORM_VENDOR , "CL_PLATFORM_VENDOR");

displayPlatformInfo( platforms[i], CL_PLATFORM_EXTENSIONS , "CL_PLATFORM_EXTENSIONS");

// Assume that we don't know how many devices are OpenCL compliant, we locate everything !

displayDeviceInfo( platforms[i], CL_DEVICE_TYPE_ALL );

}

return 0;

}

실행 결과

Number of OpenCL platforms found : 1

CL_PLATFORM_PROFILE : FULL_PROFILE

CL_PLATFORM_VERSION : OpenCL 1.2 (Aug 24 2013 21:03:27)

CL_PLATFORM_NAME : Apple

CL_PLATFORM_VENDOR : Apple

CL_PLATFORM_EXTENSIONS : cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event

Numnber of detected OpenCL devices 3

CPU Detected

VENDOR ID: 0xffffffff

Maximum number of parallel compute units: 8

Maximum dimensions for global/local work-item IDs: 3

Maximum number of work-items in each dimension: ( 1024 1 1 )

Maximum number of work-items in a work-group: 1024

Supported extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_APPLE_fp64_basic_ops cl_APPLE_fixed_alpha_channel_orders cl_APPLE_biased_fixed_point_image_formats cl_APPLE_command_queue_priority

GPU Detected

VENDOR ID: 0x1022700

Maximum number of parallel compute units: 2

Maximum dimensions for global/local work-item IDs: 3

Maximum number of work-items in each dimension: ( 1024 1024 64 )

Maximum number of work-items in a work-group: 1024

Supported extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_APPLE_fp64_basic_ops cl_khr_fp64 cl_khr_3d_image_writes cl_khr_depth_images cl_khr_gl_depth_images cl_khr_gl_msaa_sharing cl_khr_image2d_from_buffer

GPU Detected

VENDOR ID: 0x1024400

Maximum number of parallel compute units: 16

Maximum dimensions for global/local work-item IDs: 3

Maximum number of work-items in each dimension: ( 512 512 512 )

Maximum number of work-items in a work-group: 512

Program ended with exit code: 0

위에서 나온 용어에 대해서 다음페이지에서 설명한다.

저작자표시

'Computer Vision' 카테고리의 다른 글

OpenCL : Using work item to partition data (5) (1)	2014.06.01
OpenCL::Query OpenCL kernel (4) (0)	2014.05.02
OpenCL Test Program (2) (0)	2014.03.18
OpenCL test program (1) (0)	2014.03.15
OpenCL Architecture (0)	2014.03.15

Posted by GUNDAM_IM

Computer Vision2014. 3. 18. 10:46

OpenCL Test Program (2)

Device에 대한 정보를 얻는 코드이다.

추가되는 함수는

displayDeviceDetails

displayDeviceInfo

이다.

이들 함수는 loop내에서 호출된다.

for( cl_uint i = 0 ; i < numOfPlatforms ; ++i)

{

displayPlatformInfo( platforms[i], CL_PLATFORM_PROFILE , "CL_PLATFORM_PROFILE");

displayPlatformInfo( platforms[i], CL_PLATFORM_VERSION , "CL_PLATFORM_VERSION");

.........

// Assume that we don't know how many devices are OpenCL compliant, we locate everything !

displayDeviceInfo( platforms[i], CL_DEVICE_TYPE_ALL );

}

플랫폼에서 얻어진 디바이스의 정보를 프린트해주는 함수를 추가하고

그 함수의 내부에서 디바이스의 개수를 확인 한 뒤에 개수만큼 Loop를 돌려서 정보를 프린트한다.

즉 플랫폼 1개당 디바이스의 개수는 복수개가 검출 될 수 있는 것이다.

.....

/* Determine how many devices are connected to your platform */

error = clGetDeviceIDs(id, dev_type, 0, NULL, &numOfDevices); <= 디바이스의 개수를 확인

......

error = clGetDeviceIDs(id, dev_type, numOfDevices, devices, NULL); <= 확인된 개수만큼 정보를 가지고 옴

....

/* we attempt to retrieve some information about the devices */

for( int i = 0 ; i < numOfDevices ; ++i)

{

displayDeviceDetails(devices[i], CL_DEVICE_TYPE, "CL_DEVICE_TYPE");

displayDeviceDetails(devices[i], CL_DEVICE_VENDOR_ID, "CL_DEVICE_VENDOR_ID");

..........

displayDeviceDetails(devices[i], CL_DEVICE_MAX_WORK_GROUP_SIZE, "CL_DEVICE_MAX_WORK_GROUP_SIZE");

}

// main.cpp

// TestOpenCL

// Created by freegear on 2014. 2. 8..

#include <iostream>

#include <stdio.h>

#include <stdlib.h>

#ifdef __APPLE__

# include <OpenCL/opencl.h>

#else

# include <CL/cl.h>

#endif

//Function proto type

void displayDeviceDetails(cl_device_id id ,

cl_device_info param_name,

const char * paramNameAsStr);

void displayDeviceInfo( cl_platform_id id ,

cl_device_type dev_type)

{

/* OpenCL 1.1 device type */

cl_int error = 0 ;

cl_uint numOfDevices = 0 ;

/* Determine how many devices are connected to your platform */

error = clGetDeviceIDs(id, dev_type, 0, NULL, &numOfDevices);

if( error != CL_SUCCESS){

perror("Unable to obtain any OpenCL compliant device info");

exit(1);

}

cl_device_id * devices = (cl_device_id*)alloca(sizeof(cl_device_id) * numOfDevices);

/* Load the information about your devices into the variable 'devices' */

error = clGetDeviceIDs(id, dev_type, numOfDevices, devices, NULL);

if (error != CL_SUCCESS)

{

perror("Unable to obtain any OpenCL compliant device info");

exit(1);

}

printf("Numnber of detected OpenCL devices %d\n", numOfDevices);

/* we attempt to retrieve some information about the devices */

for( int i = 0 ; i < numOfDevices ; ++i)

{

displayDeviceDetails(devices[i], CL_DEVICE_TYPE, "CL_DEVICE_TYPE");

displayDeviceDetails(devices[i], CL_DEVICE_VENDOR_ID, "CL_DEVICE_VENDOR_ID");

displayDeviceDetails(devices[i], CL_DEVICE_MAX_COMPUTE_UNITS, "CL_DEVICE_MAX_COMPUTE_UNITS");

displayDeviceDetails(devices[i], CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS, "CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS");

displayDeviceDetails(devices[i], CL_DEVICE_MAX_WORK_ITEM_SIZES, "CL_DEVICE_MAX_WORK_ITEM_SIZES");

displayDeviceDetails(devices[i], CL_DEVICE_MAX_WORK_GROUP_SIZE, "CL_DEVICE_MAX_WORK_GROUP_SIZE");

}

void displayDeviceDetails( cl_device_id id,

cl_device_info param_name,

const char * paramNameAsStr)

{

cl_int error = 0 ;

size_t paramSize = 0 ;

error = clGetDeviceInfo( id ,param_name, 0 , NULL, &paramSize);

if(error != CL_SUCCESS)

{

perror("Unable to obtain device info for param");

return;

}

The cl_device_info are preporcessor directives define in cl.h

switch(param_name)

{

case CL_DEVICE_TYPE :

{

cl_device_type * devType = (cl_device_type*)alloca(sizeof(cl_device_type)*paramSize);

error = clGetDeviceInfo(id, param_name, paramSize, devType, NULL);

if(error != CL_SUCCESS)

{

perror("Unable to optain device info for param\n");

return;

}

switch( *devType)

{

case CL_DEVICE_TYPE_CPU : printf("CPU Detected \n"); break;

case CL_DEVICE_TYPE_GPU : printf("GPU Detected \n"); break;

case CL_DEVICE_TYPE_ACCELERATOR : printf("Accelerator detected\n");break;

case CL_DEVICE_TYPE_DEFAULT : printf("default detected \n"); break;

}

break;

}

case CL_DEVICE_VENDOR_ID :

case CL_DEVICE_MAX_COMPUTE_UNITS :

case CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS : {

cl_uint* ret = (cl_uint*) alloca(sizeof(cl_uint) * paramSize);

error = clGetDeviceInfo( id, param_name, paramSize, ret, NULL );

if (error != CL_SUCCESS ) {

perror("Unable to obtain device info for param\n");

return;

}

switch (param_name) {

case CL_DEVICE_VENDOR_ID: printf("\tVENDOR ID: 0x%x\n", *ret); break;

case CL_DEVICE_MAX_COMPUTE_UNITS: printf("\tMaximum number of parallel compute units: %d\n", *ret); break;

case CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: printf("\tMaximum dimensions for global/local work-item IDs: %d\n", *ret); break;

}

break;

case CL_DEVICE_MAX_WORK_ITEM_SIZES :

{

cl_uint maxWIDimensions;

size_t* ret = (size_t*) alloca(sizeof(size_t) * paramSize);

error = clGetDeviceInfo( id, param_name, paramSize, ret, NULL );

error = clGetDeviceInfo( id, CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS, sizeof(cl_uint), &maxWIDimensions, NULL );

if (error != CL_SUCCESS ) {

perror("Unable to obtain device info for param\n");

return;

}

printf("\tMaximum number of work-items in each dimension: ( ");

for(cl_int i =0; i < maxWIDimensions; ++i ) {

printf("%d ", (int)ret[i]);

}

printf(" )\n");

}break;

case CL_DEVICE_MAX_WORK_GROUP_SIZE :

{

size_t* ret = (size_t*) alloca(sizeof(size_t) * paramSize);

error = clGetDeviceInfo( id, param_name, paramSize, ret, NULL );

if (error != CL_SUCCESS ) {

perror("Unable to obtain device info for param\n");

return;

}

printf("\tMaximum number of work-items in a work-group: %d\n", (int)*ret);

}

break;

case CL_DEVICE_NAME :

case CL_DEVICE_VENDOR :

{

char data[48];

error = clGetDeviceInfo( id, param_name, paramSize, data, NULL );

if (error != CL_SUCCESS ) {

perror("Unable to obtain device name/vendor info for param\n");

return;

}

switch (param_name) {

case CL_DEVICE_NAME : printf("\tDevice name is %s\n", data);break;

case CL_DEVICE_VENDOR : printf("\tDevice vendor is %s\n", data);break;

}

break;

case CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE: {

cl_uint* size = (cl_uint*) alloca(sizeof(cl_uint) * paramSize);

error = clGetDeviceInfo( id, param_name, paramSize, size, NULL );

if (error != CL_SUCCESS ) {

perror("Unable to obtain device name/vendor info for param\n");

return;

}

printf("\tDevice global cacheline size: %d bytes\n", (*size)); break;

} break;

case CL_DEVICE_GLOBAL_MEM_SIZE:

case CL_DEVICE_MAX_MEM_ALLOC_SIZE:

{

cl_ulong* size = (cl_ulong*) alloca(sizeof(cl_ulong) * paramSize);

error = clGetDeviceInfo( id, param_name, paramSize, size, NULL );

if (error != CL_SUCCESS ) {

perror("Unable to obtain device name/vendor info for param\n");

return;

}

switch (param_name) {

case CL_DEVICE_GLOBAL_MEM_SIZE: printf("\tDevice global mem: %ld mega-bytes\n", (*size)>>20); break;

case CL_DEVICE_MAX_MEM_ALLOC_SIZE: printf("\tDevice max memory allocation: %ld mega-bytes\n", (*size)>>20); break;

}

break;

}//end of switch(param_name

}// end of displayDeviceDetailes

void displayPlatformInfo(

cl_platform_id id ,

cl_platform_info param_name,

const char * paramNameAsStr

)

{

cl_int error = 0 ;

size_t paramSize = 0 ;

error = clGetPlatformInfo(

id,

param_name,

NULL,

&paramSize

);

char * moreInfo = (char *) alloca(sizeof(char)*paramSize);

error = clGetPlatformInfo(

id, /* The platform ID returned by clGetPlatformIDs or can be NULL.

If platform is NULL, the behavior is implementation-defined.*/

param_name, /* An enumeration constant that identifies the platform

information being queried. It can be one of the following values

as specified in the table below. */

paramSize, /* Specifies the size in bytes of memory pointed to by param_value.

This size in bytes must be greater than or equal to size of return type

specified in the table below. */

moreInfo, /* A pointer to memory location where appropriate values for a given

param_value will be returned. Acceptable param_value values are listed in

the table below. If param_value is NULL, it is ignored.*/

NULL /* Returns the actual size in bytes of data being queried by param_value.

If param_value_size_ret is NULL, it is ignored */

);

if( error != CL_SUCCESS)

{

if ( error == CL_INVALID_PLATFORM)

perror("Error CL_INVALID_PLATFORM");

else if (error == CL_INVALID_VALUE)

perror("Error CL_INVALID_VALUE");

else

perror("Unable to find any OpenCL platform information");

return ;

}

printf("%s : %s\n" , paramNameAsStr, moreInfo);

} // end of displayPlatformInfo

int main(int argc, const char * argv[])

{

// OpenCL 1.2 Data struction

cl_platform_id * platforms ;

/* OpenCL 1.1 scalar data types */

cl_uint numOfPlatforms ;

cl_int error ;

Get the number of platforms

Remember that for each vendor's SDK installed on the computer,

the number of available platform also increased.

error = clGetPlatformIDs(0, /*

The number of cl_platform_id entries that can be added to platforms.

If platforms is not NULL, the num_entries must be greater than zero.

NULL, /* Returns a list of OpenCL platforms found.

The cl_platform_id values returned in platforms can be used to identify

a specific OpenCL platform. If platforms argument is NULL,

this argument is ignored.

The number of OpenCL platforms returned is the mininum of the value

specified by num_entries or the number of OpenCL platforms available. */

&numOfPlatforms

/* Returns the number of OpenCL platforms available.

If num_platforms is NULL, this argument is ignored. */

);

if ( error < 0)

{

perror("Unable to find any OpenCL platforms");

exit(1);

}

// allocate memory for the number of installed platforms

// alloca(....) occupies some stack space but is

// automatically freed on return

platforms = (cl_platform_id*) alloca(sizeof(cl_platform_id)*numOfPlatforms);

printf("Number of OpenCL platforms found : %d\n", numOfPlatforms);

error = clGetPlatformIDs(numOfPlatforms, /*

The number of cl_platform_id entries that can be added to platforms.

If platforms is not NULL, the num_entries must be greater than zero.

platforms, /* Returns a list of OpenCL platforms found.

The cl_platform_id values returned in platforms can be used to identify

a specific OpenCL platform. If platforms argument is NULL,

this argument is ignored.

The number of OpenCL platforms returned is the mininum of the value

specified by num_entries or the number of OpenCL platforms available. */

NULL

/* Returns the number of OpenCL platforms available.

If num_platforms is NULL, this argument is ignored. */

);

if ( error < 0)

{

perror("Unable to find any OpenCL platforms");

exit(1);

}

// We invoke the API 'clPlatformInfo' twice for each

// parameter we are trying to extract

// and we use the return value to create temporary data

// structure (on the stack) to store

// the returned information ot the second invocation.

for( cl_uint i = 0 ; i < numOfPlatforms ; ++i)

{

displayPlatformInfo( platforms[i], CL_PLATFORM_PROFILE , "CL_PLATFORM_PROFILE");

displayPlatformInfo( platforms[i], CL_PLATFORM_VERSION , "CL_PLATFORM_VERSION");

displayPlatformInfo( platforms[i], CL_PLATFORM_NAME , "CL_PLATFORM_NAME");

displayPlatformInfo( platforms[i], CL_PLATFORM_VENDOR , "CL_PLATFORM_VENDOR");

displayPlatformInfo( platforms[i], CL_PLATFORM_EXTENSIONS , "CL_PLATFORM_EXTENSIONS");

// Assume that we don't know how many devices are OpenCL compliant, we locate everything !

displayDeviceInfo( platforms[i], CL_DEVICE_TYPE_ALL );

}

return 0;

}

실행된 결과는 아래와 같다.

Number of OpenCL platforms found : 1

CL_PLATFORM_PROFILE : FULL_PROFILE

CL_PLATFORM_VERSION : OpenCL 1.2 (Aug 24 2013 21:03:27)

CL_PLATFORM_NAME : Apple

CL_PLATFORM_VENDOR : Apple

CL_PLATFORM_EXTENSIONS : cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event

Numnber of detected OpenCL devices 3

CPU Detected

VENDOR ID: 0xffffffff

Maximum number of parallel compute units: 8

Maximum dimensions for global/local work-item IDs: 3

Maximum number of work-items in each dimension: ( 1024 1 1 )

Maximum number of work-items in a work-group: 1024

GPU Detected

VENDOR ID: 0x1022700

Maximum number of parallel compute units: 2

Maximum dimensions for global/local work-item IDs: 3

Maximum number of work-items in each dimension: ( 1024 1024 64 )

Maximum number of work-items in a work-group: 1024

GPU Detected

VENDOR ID: 0x1024400

Maximum number of parallel compute units: 16

Maximum dimensions for global/local work-item IDs: 3

Maximum number of work-items in each dimension: ( 512 512 512 )

Maximum number of work-items in a work-group: 512

저작자표시

'Computer Vision' 카테고리의 다른 글

OpenCL::Query OpenCL kernel (4) (0)	2014.05.02
OpenCL Test Program (3) (0)	2014.03.31
OpenCL test program (1) (0)	2014.03.15
OpenCL Architecture (0)	2014.03.15
Optical Flow에 의한 영상 정보 분석 (0)	2013.05.28

Posted by GUNDAM_IM

Computer Vision2014. 3. 15. 18:03

OpenCL test program (1)

OpenCL test program

책의 예제를 넣어서 실행

책에서 오류가 두군데가 있었다.

(1) ifdef에서 APPLE platform을 확인하는 매크로 정의 오류

== WAS ==

#ifdef APPLE

== IS ==

#ifdef __APPLE__

(2) OpenCL의 원칙은 OpenCL에 정보의 크기를 물어서 확인하고 그만큼 공간을 만든뒤에 그 정보를 받아들이는 것인데

예제에서는 정보를 물어서 공간을 만드는 데 까지는 있지만, 그 이후에 정보를 다시 받아들이는 함수를 호출 하지 않았다.

....

main(...)

{

...

먼저 필요한 공간의 크기를 확인

error = clGetPlatformIDs(0,

NULL,

&numOfPlatforms

);

if ( error < 0)

...

공간을 확보 하고

이제 확보된 공간으로 값을 받아온다.

error = clGetPlatformIDs(numOfPlatforms, <== 이 함수를 호출 하는 것을 추가함.

platforms,

NULL

);

...

}

전체 코드는 아래와 같다.

// main.cpp

// TestOpenCL

// Created by freegear on 2014. 2. 8..

#include <iostream>

#include <stdio.h>

#include <stdlib.h>

#ifdef __APPLE__

#include <OpenCL/opencl.h>

#else

#include <CL/cl.h>

#endif

void displayPlatformInfo(

cl_platform_id id ,

cl_platform_info param_name,

const char * paramNameAsStr

)

{

cl_int error = 0 ;

size_t paramSize = 0 ;

error = clGetPlatformInfo(

id,

param_name,

NULL,

&paramSize

);

char * moreInfo = (char *) alloca(sizeof(char)*paramSize);

error = clGetPlatformInfo(

id, /* The platform ID returned by clGetPlatformIDs or can be NULL.

If platform is NULL, the behavior is implementation-defined.*/

param_name, /* An enumeration constant that identifies the platform

information being queried. It can be one of the following values

as specified in the table below. */

paramSize, /* Specifies the size in bytes of memory pointed to by param_value.

This size in bytes must be greater than or equal to size of return type

specified in the table below. */

moreInfo, /* A pointer to memory location where appropriate values for a given

param_value will be returned. Acceptable param_value values are listed in

the table below. If param_value is NULL, it is ignored.*/

NULL /* Returns the actual size in bytes of data being queried by param_value.

If param_value_size_ret is NULL, it is ignored */

);

if( error != CL_SUCCESS)

{

if ( error == CL_INVALID_PLATFORM)

perror("Error CL_INVALID_PLATFORM");

else if (error == CL_INVALID_VALUE)

perror("Error CL_INVALID_VALUE");

else

perror("Unable to find any OpenCL platform information");

return ;

}

printf("%s : %s\n" , paramNameAsStr, moreInfo);

} // end of displayPlatformInfo

int main(int argc, const char * argv[])

{

// OpenCL 1.2 Data struction

cl_platform_id * platforms ;

/* OpenCL 1.1 scalar data types */

cl_uint numOfPlatforms ;

cl_int error ;

Get the number of platforms

Remember that for each vendor's SDK installed on the computer,

the number of available platform also increased.

error = clGetPlatformIDs(0, /*

The number of cl_platform_id entries that can be added to platforms.

If platforms is not NULL, the num_entries must be greater than zero.

NULL, /* Returns a list of OpenCL platforms found.

The cl_platform_id values returned in platforms can be used to identify

a specific OpenCL platform. If platforms argument is NULL,

this argument is ignored.

The number of OpenCL platforms returned is the mininum of the value

specified by num_entries or the number of OpenCL platforms available. */

&numOfPlatforms

/* Returns the number of OpenCL platforms available.

If num_platforms is NULL, this argument is ignored. */

);

if ( error < 0)

{

perror("Unable to find any OpenCL platforms");

exit(1);

}

// allocate memory for the number of installed platforms

// alloca(....) occupies some stack space but is

// automatically freed on return

platforms = (cl_platform_id*) alloca(sizeof(cl_platform_id)*numOfPlatforms);

printf("Number of OpenCL platforms found : %d\n", numOfPlatforms);

error = clGetPlatformIDs(numOfPlatforms, /*

The number of cl_platform_id entries that can be added to platforms.

If platforms is not NULL, the num_entries must be greater than zero.

platforms, /* Returns a list of OpenCL platforms found.

The cl_platform_id values returned in platforms can be used to identify

a specific OpenCL platform. If platforms argument is NULL,

this argument is ignored.

The number of OpenCL platforms returned is the mininum of the value

specified by num_entries or the number of OpenCL platforms available. */

NULL

/* Returns the number of OpenCL platforms available.

If num_platforms is NULL, this argument is ignored. */

);

if ( error < 0)

{

perror("Unable to find any OpenCL platforms");

exit(1);

}

// We invoke the API 'clPlatformInfo' twice for each

// parameter we are trying to extract

// and we use the return value to create temporary data

// structure (on the stack) to store

// the returned information ot the second invocation.

for( cl_uint i = 0 ; i < numOfPlatforms ; ++i)

{

displayPlatformInfo( platforms[i], CL_PLATFORM_PROFILE , "CL_PLATFORM_PROFILE");

displayPlatformInfo( platforms[i], CL_PLATFORM_VERSION , "CL_PLATFORM_VERSION");

displayPlatformInfo( platforms[i], CL_PLATFORM_NAME , "CL_PLATFORM_NAME");

displayPlatformInfo( platforms[i], CL_PLATFORM_VENDOR , "CL_PLATFORM_VENDOR");

displayPlatformInfo( platforms[i], CL_PLATFORM_EXTENSIONS , "CL_PLATFORM_EXTENSIONS");

}

return 0;

}

실행 결과는 아래와 같다.

Number of OpenCL platforms found : 1

CL_PLATFORM_PROFILE : FULL_PROFILE

CL_PLATFORM_VERSION : OpenCL 1.2 (Aug 24 2013 21:03:27)

CL_PLATFORM_NAME : Apple

CL_PLATFORM_VENDOR : Apple

CL_PLATFORM_EXTENSIONS : cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event

저작자표시

'Computer Vision' 카테고리의 다른 글

OpenCL Test Program (3) (0)	2014.03.31
OpenCL Test Program (2) (0)	2014.03.18
OpenCL Architecture (0)	2014.03.15
Optical Flow에 의한 영상 정보 분석 (0)	2013.05.28
OpenCV를 이용한 Face Detection (0)	2013.05.19

Posted by GUNDAM_IM

Computer Vision2014. 3. 15. 18:03

OpenCL Architecture

OpenCL에서 사용되는 개념은 아래와 같다.

Host 즉 CPU에서 Command와 Data를 Device로 보낸다.

이는 Command Queue로 저장된다.

Code의 실행은 순차적으로 진행되다가 Parallel하게 진행되는 방식으로 이루어진다.

Host에서 Serial하게 진행하고 parallel code는 Device에서 실행하게 된다.

Work Item은 N Dimension Work Item으로 만들어진다.

Global Domain으로 1024 x 1024로 만들어진다.

이 Global Domain에서 32x32로 Local Dimension으로 나누어서 한번에 실행된다.

Work Item : OpenCL Device상에서 계산이 이루어지는 기본 유닛

Kernel : Work Item을 구동시키는 code

Program : Work Item 을 구동시키는 Kernel과 기타 함수들의 모음

Context :

Command Queue : Host에서 Device로 보내지는 work의 pipeline

메모리 모델은 아래와 같다.

Private Memory는 각 Work Item별로 할당된 메모리이다.

Local Memory는 Work Group별로 할당된 메모리를 의미한다.

Global / Constant memory는 모든 Work Group에서 보여지는 메모리를 의미한다.

Host Memory는 당연히 Host용 메모리를 의미함.

Host -> Global -> Local -> Private 순으로 메모리의 데이터를 옮길 수 있다.

저작자표시

'Computer Vision' 카테고리의 다른 글

OpenCL Test Program (2) (0)	2014.03.18
OpenCL test program (1) (0)	2014.03.15
Optical Flow에 의한 영상 정보 분석 (0)	2013.05.28
OpenCV를 이용한 Face Detection (0)	2013.05.19
3D Noise Reduction algorithm test (0)	2011.10.05

Posted by GUNDAM_IM

Computer Vision2013. 5. 28. 09:16

Optical Flow에 의한 영상 정보 분석

관련된 알고리즘과 코드는 인터넷에 많이 있습니다.

여기서 테스트한 것은 관련된 코드중 일부를 가지고 와서 조금 수정해서 테스트를 하였습니다.

http://youtu.be/6VIQh8_bX94

LK 알고리즘을 이용한 구현입니다.

저작자표시

'Computer Vision' 카테고리의 다른 글

OpenCL test program (1) (0)	2014.03.15
OpenCL Architecture (0)	2014.03.15
OpenCV를 이용한 Face Detection (0)	2013.05.19
3D Noise Reduction algorithm test (0)	2011.10.05
OpenCV 2.3 Computer vision (4) (4)	2011.09.14

Posted by GUNDAM_IM

Computer Vision2013. 5. 19. 23:30

OpenCV를 이용한 Face Detection

OpenCV를 이용한 Face Detection 입니다.

코드는 길지 않고 인터넷에서도 많이 돌아다니는 코드입니다.

몇가지 테스트할 일이 있어 한번 빌드를 해보았습니다.

저작자표시

'Computer Vision' 카테고리의 다른 글

OpenCL Architecture (0)	2014.03.15
Optical Flow에 의한 영상 정보 분석 (0)	2013.05.28
3D Noise Reduction algorithm test (0)	2011.10.05
OpenCV 2.3 Computer vision (4) (4)	2011.09.14
OpenCV 2.3 Computer vision (3) (0)	2011.09.06

Posted by GUNDAM_IM

Computer Vision2011. 10. 5. 21:07

3D Noise Reduction algorithm test

3D Noise Reducton algorithm 을 테스트 하고 있습니다.

이미지는 아래와 같은데, 이것을 동영상으로 보면 유투브에서 너무 압축이 많이 되어서 잘 드러나지가 않네요

왼쪽 그림이 잡음 제거 전 오른쪽 그림이 잡음 제거 후입니다.
가우시안 잡음을 1단계로 주었을 때의 영상입니다.

위의 영상은 가우시안 잡음을 표준 편차를 0.003으로 주었을때의 영상과 그때의 잡음 제거 결과입니다.
잡음이 정말 많기 때문에 완전하게는 제거를 못하지만, 하여튼 하드웨어로 구현하기에는 쓸만한 성능입니다.

요 영상을 유투브에 올려서 링크를 걸려고 하였는데
동영상은 압축이 너무 많이 되어서 잡음이 오히려 표현이 안되네요

최근에 원더걸스 동영상으로 테스트한 화면입니다.
노이즈 레벨을 강하게 하였습니다.

저작자표시

'Computer Vision' 카테고리의 다른 글

Optical Flow에 의한 영상 정보 분석 (0)	2013.05.28
OpenCV를 이용한 Face Detection (0)	2013.05.19
OpenCV 2.3 Computer vision (4) (4)	2011.09.14
OpenCV 2.3 Computer vision (3) (0)	2011.09.06
OpenCV 2.3 Computer vision (2) (0)	2011.09.05

Posted by GUNDAM_IM

«이전 1 2 3 4 다음»

Dreamer GUNDAM의 블로그::무료로 뉴타입이 되는 별까지 앞으로 356광년

'Computer Vision'에 해당되는 글 34건

증강 현실 테스트

'Computer Vision' 카테고리의 다른 글

OpenCL : Using work item to partition data (5)

'Computer Vision' 카테고리의 다른 글

OpenCL::Query OpenCL kernel (4)

'Computer Vision' 카테고리의 다른 글

OpenCL Test Program (3)

'Computer Vision' 카테고리의 다른 글

OpenCL Test Program (2)

'Computer Vision' 카테고리의 다른 글

OpenCL test program (1)

'Computer Vision' 카테고리의 다른 글

OpenCL Architecture

'Computer Vision' 카테고리의 다른 글

Optical Flow에 의한 영상 정보 분석

'Computer Vision' 카테고리의 다른 글

OpenCV를 이용한 Face Detection

'Computer Vision' 카테고리의 다른 글

3D Noise Reduction algorithm test

'Computer Vision' 카테고리의 다른 글

카테고리

공지사항

태그목록

최근에 올라온 글

최근에 달린 댓글

글 보관함

달력

링크

티스토리툴바

« » 2024.4
일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30