English 中文(简体)
10 character id that s globally and locally unique
原标题:

I need to generate a 10 character unique id (SIP/VOIP folks need to know that it s for a param icid-value in the P-Charging-Vector header). Each character shall be one of the 26 ASCII letters (case sensitive), one of the 10 ASCII digits, or the hyphen-minus.

It MUST be globally unique (outside of the machine generating the id) and sufficiently locally unique (within the machine generating the id) , and all that needs to be packed into 10 characters, phew!

Here s my take on it. I m FIRST encoding the MUST be encoded globally unique local ip address into base-63 (its an unsigned long int that will occupy 1-6 characters after encoding) and then as much as I can of the current time stamp (its a time_t/long long int that will occupy 9-4 characters after encoding depending on how much space the encoded ip address occupies in the first place).

I ve also added loop count i to the time stamp to preserve the uniqueness in case the function is called more than once in a second.

Is this good enough to be globally and locally unique or is there another better approach?

Gaurav

#include <stdio.h>
#include <string.h>
#include <sys/time.h>

//base-63 character set
static char set[]="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-";

// b63() returns the next vacant location in char array x
int b63(long long longlong,char *x,int index){
    if(index > 9)
        return index+1;

    //printf("index=%d,longlong=%lld,longlong%63=%lld
",index,longlong,longlong%63);
    if(longlong < 63){
        x[index] = set[longlong];
        return index+1;
    }  

    x[index] = set[longlong%63];
    return b63(longlong/63,x,index+1);
}

int main(){
    char x[11],y[11] = {0}; /*    is taken care of here */

    //let s generate 10 million ids
    for(int i=0; i<10000000; i++){

        /*  add i to timestamp to take care of sub-second function calls,
            3770168404(is a sample ip address in n/w byte order) =                84.52.184.224 */
        b63((long long)time(NULL)+i,x,b63((long long)3770168404,x,0));

        // reverse the char array to get proper base-63 output
        for(int j=0,k=9; j<10; j++,k--)
            y[j] = x[k];

        printf("%s
",y);
    }

    return 0;
}
问题回答

It MUST be globally unique (outside of the machine generating the id) and sufficiently locally unique (within the machine generating the id) , and all that needs to be packed into 10 characters, phew!

Are you in control of all the software generating ids? Are you doling out the ids? If not...

I know nothing about SIP, but there s got to be a misunderstanding that you have about the spec (or the spec must be wrong). If another developer attempts to build an id using a different algorithm than the one you ve cooked up, you will have collisions with their ids, meaning they will know longer be globally unique in that system.

I d go back to the SIP documentation, see if there s an appendix with an algorithm for generating these ids. Or maybe a smarter SO user than I can answer what the SIP algorithm for generating these id s is.

I would have a serious look at RFC 4122 which describes the generation of 128-bit GUIDs. There are several different generation algorithms, some of which may fit (MAC address-based one springs to mind). This is a bigger number-space than yours 2^128 = 3.4 * 10^38 compared with 63^10 = 9.8 * 10^17, so you may have to make some compromises on uniqueness. Consider factors like how frequently the IDs will be generated.

However in the RFC, they have considered some practical issues, like the ability to generate large numbers of unique values efficiently by pre-allocating blocks of IDs.

Can t you just have a distributed ID table ?

Machines on NAT ed LANs will often have an IP from a small range, and not all of the 32-bit values would be valid (think multicast, etc). Machines may also grab the same timestamp, especially if the granularity is large (such as seconds); keep in mind that the year is very often going to be the same, so it s the lower bits that will give you the most uniqueness .

You may want to take the various values, hash them with a cryptographic hash, and translate that to the characters you are permitted to use, truncating to the 10 characters.

But you re dealing with a value with less than 60 bits; you need to think carefully about the implications of a collision. You might be approaching the problem the wrong way...

Well, if I cast aside the fact that I think this is a bad idea, and concentrate on a solution to your problem, here s what I would do:

You have an id range of 10^63, which correspond to roughly 60 bits. You want it to be both "globally" and "locally" unique. Let s generate the first N bits to be globally unique, and the rest to be locally unique. The concatenation of the two will have the properties you are looking for.

First, the global uniqueness : IP won t work, especially local ones, they hold very little entropy. I would go with MAC addresses, they were made for being globally unique. They cover a range of 256^6, so using up 6*8 = 48 bits.

Now, for the locally unique : why not use the process ID ? I m making the assumption that the uniqueness is per process, if it s not, you ll have to think of something else. On Linux, process ID is 32 bits. If we wanted to nitpick, the 2 most significant bytes probably hold very little entropy, as they would at 0 on most machines. So discard them if you know what you re doing.

So now you ll see you have a problem as it would use up to 70 bits to generate a decent (but not bulletproof) globally and locally unique ID (using my technique anyway). And since I would also advise to put in a random number (at least 8 bits long) just in case, it definitely won t fit. So if I were you, I would hash the ~78 generated bits to SHA1 (for example), and convert the first 60 bits of the resulting hash to your ID format. To do so, notice that you have a 63 characters range to chose from, so almost the full range of 6 bits. So split the hash in 6 bits pieces, and use the first 10 pieces to select the 10 characters of your ID from the 63 character range. Obviously, the range of 6 bits is 64 possible values (you only want 63), so if you have a 6 bits piece equals to 63, either floor it to 62 or assume modulo 63 and pick 0. It will slightly bias the distribution, but it s not too bad.

So there, that should get you a decent globally and locally pseudo-unique ID.

A few last points: according to the Birthday paradox, you ll get a ~ 1 % chance of collisions after generating ~ 142 million IDs, and a 99% chance after generating 3 billions IDs. So if you hit great commercial success and have millions of IDs being generated, get a larger ID.

Finally, I think I provided a "better than the worse" solution to your problem, but I can t help but think you re attacking this problem in the wrong fashion, and possibly as other have mentioned, misreading the specs. So use this if there are no other ways that would be more "bulletproof" (centralised ID provider, much longer ID ... ).

Edit: I re-read your question, and you say you call this function possibly many times a second. I was assuming this was to serve as some kind of application ID, generated once at the start of your application, and never changed afterwards until it exited. Since it s not the case, you should definitely add a random number and if you generate a lot of IDs, make that at least a 32 bits number. And read and re-read the Birthday Paradox I linked to above. And seed your number generator to a highly entropic value, like the usec value of the current timestamp for example. Or even go so far as to get your random values from /dev/urandom . Very honestly, my take on your endeavour is that 60 bits is probably not enough...

Hmm, using the system clock may be a weakness... what if someone sets the clock back? You might re-generate the same ID again. But if you are going to use the clock, you might call gettimeofday() instead of time(); at least that way you ll get better resolution than one second.

@Doug T. No, I m not in control of all the software generating the ids. I agree without a standardized algorithm there maybe collisions, I ve raised this issue in the appropriate mailing lists.

@Florian Taking a cue from you re reply. I decided to use the /dev/urandom PRNG for a 32 bit random number as the space unique component of the id. I assume that every machine will have its own noise signature and it can be assumed to be safely globally unique in space at an instant of time. The time unique component that I used earlier remains the same.

These unique ids are generated to collate all the billing information collected from different network functions that independently generated charging information of a particular call during call processing.

Here s the updated code below:

Gaurav

 #include <stdio.h>
 #include <string.h>
 #include <time.h>

 //base-63 character set
 static char set[]="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-";

 // b63() returns the next vacant location in char array x
 int b63(long long longlong, char *x, int index){
     if(index > 9)
         return index+1;

     if(longlong < 63){
         x[index] = set[longlong];
         return index+1;
     }  

     x[index] = set[longlong%63];
     return b63(longlong/63, x, index+1);
 }

 int main(){
     unsigned int number;
     char x[11], y[11] = {0};

     FILE *urandom = fopen("/dev/urandom", "r");
     if(!urandom)
         return -1;

     //let s generate a 1 billion ids
     for(int i=0; i<1000000000; i++){

         fread(&number, 1, sizeof(number), urandom);

         // add i to timestamp to take care of sub-second function calls, 
         b63((long long)time(NULL)+i, x, b63((long long)number, x, 0));

         // reverse the char array to get proper base-63 output
         for(int j=0, k=9; j<10; j++, k--)
             y[j] = x[k];

         printf("%s
", y);
     }

     if(urandom)
     fclose(urandom);

     return 0;
 }




相关问题
Undefined reference

I m getting this linker error. I know a way around it, but it s bugging me because another part of the project s linking fine and it s designed almost identically. First, I have namespace LCD. Then I ...

C++ Equivalent of Tidy

Is there an equivalent to tidy for HTML code for C++? I have searched on the internet, but I find nothing but C++ wrappers for tidy, etc... I think the keyword tidy is what has me hung up. I am ...

Template Classes in C++ ... a required skill set?

I m new to C++ and am wondering how much time I should invest in learning how to implement template classes. Are they widely used in industry, or is this something I should move through quickly?

Print possible strings created from a Number

Given a 10 digit Telephone Number, we have to print all possible strings created from that. The mapping of the numbers is the one as exactly on a phone s keypad. i.e. for 1,0-> No Letter for 2->...

typedef ing STL wstring

Why is it when i do the following i get errors when relating to with wchar_t? namespace Foo { typedef std::wstring String; } Now i declare all my strings as Foo::String through out the program, ...

C# Marshal / Pinvoke CBitmap?

I cannot figure out how to marshal a C++ CBitmap to a C# Bitmap or Image class. My import looks like this: [DllImport(@"test.dll", CharSet = CharSet.Unicode)] public static extern IntPtr ...

Window iconification status via Xlib

Is it possible to check with the means of pure X11/Xlib only whether the given window is iconified/minimized, and, if it is, how?

热门标签