« 統計科学 | Main | tx bepの内部技術の発表 »

2007.10.28

Bep: 最小完全ハッシュ関数を用いた連想配列

Bepという連想配列のライブラリを公開しました。BSDライセンスです.

キーは文字列限定で,前もって大量のキーと値のペアが前もって分かっている場合(1千万個とか)、使ってもらえるよう最適化しています。(一応、アドホックな方法で一個ずつキーを登録する方法もサポートしています)
特徴は内部に最小完全ハッシュ関数を利用しており少ない作業領域量でありながらそこそこ高速に動くところです.今のところ1千万キーぐらいで動作するのは確認しています.1キーあたり必要な作業領域量は大体3bit + キー自体の長さになります.

最小完全ハッシュ関数の構築自体も面白い問題です.最小完全ハッシュ関数はキー同士が衝突せず、さらにキーの数がn個のときハッシュ値は[0...n-1]が返されるもので、ぎっしり詰まった連番が返されると思ってもよいです。この実現には以下の論文での手法を使いました.3-ハイパーグラフの頂点割り当て問題を解くことで完全ハッシュ関数を作ってます.
各キー3つのハッシュ値、a,b,cを独立に計算し、その3つのハッシュ値を頂点として持つ枝を作ります.で、この頂点、枝からなるグラフを作ります.このグラフが閉路を持たない場合、各キーに枝中の一つの頂点を他のキーと重複しないように割り当てていくことができます.でこの頂点番号をハッシュ値として使います.
Botelho, F.C., Pagh, R. and Ziviani, N. "Simple and Space-Efficient Minimal Perfect Hash Functions" 10th International Workshop on Algorithms and Data Structures (WADS07) August 2007, 139-150
これ(pdf)を使いました).

あと中でいろいろチューニングしてます。

コンパイルできない、結果間違ってる、バグってる、使いづらいとかありましたら連絡ください。できるだけ改善するようにします。よろしくお願いします。

txも地味にアップデートしてます。こちらもいろいろ使ってもらっているようです。

|

« 統計科学 | Main | tx bepの内部技術の発表 »

Comments

前いってた自作DBで使う予定のハッシュをC#で実装してます。特徴は
重複を許し、Interlockedクラスを使ってロックフリーでマルチスレッド対応、削除、更新は考慮しない。
読み込み、書き込みともに並列化できますが、大容量になるとcpu固有のキャッシュではなく共有メモリをアクセスするので並列度が下がります。

Interlockedの実装のようなコンペアアンドスワップ、コンペアアンドアッド、セットアンドテスト(だっけ?)を利用するとかなり軽く作れますね。スピンロックするときは特に最適です。

ついでにマルチスレッドクイックソート作ってみました。確かに並列化できてパフォーマンスあがりましたが上位の大きな分割はマルチスレッドできないのでここが並列度を上げれない原因です。何とか分割処理をInterlocked使って並列にやってみましたがオーバヘッドの方が大きくうまくいきませんでした。

Posted by: | 2007.11.03 at 09:19 AM

まだokaさんの最小完全ハッシュ見てませんが、高速に出来るのであれば理解してDB用に実装してみたいと思います。

Posted by: | 2007.11.03 at 10:12 AM

ぜひよろしければソースコードとかサンプルコードを公開していただけると勉強できるので教えてください。

マルチスレッドになってさらに局所性(しかも4kBとか8kBとかのオーダー)が求められるのはアルゴリズム考える側としても面白い問題ですね。

Posted by: oka | 2007.11.04 at 12:16 PM

ごめんなさい。今度こそ大丈夫です。これ以前のコードを削除してもらえませんか?
探索テスト用のスレッドを作って並列化していないのが原因でした。やはりきれいに2倍にはならないですがそこそこ早くなります。


using System;
using System.Collections.Generic;
using System.Text;
using System.Threading;
using System.Diagnostics;
struct 構造体 {
public Double キー;
public Double データ;
}
unsafe class 並列ハッシュ{
public static int CPU数=1;//Environment.ProcessorCount;//で実CPU数になる
static int ハッシュ要素数_;
static 構造体* pデータ配列;
static Int64* pハッシュ配列;
delegate void ハッシュ配列null初期化delegate(int ハッシュ開始,int ハッシュ終了);
private static void ハッシュ配列null初期化(int ハッシュ開始,int ハッシュ終了) {
Int64* ppハッシュ配列=(Int64*)pハッシュ配列;
for(int a=ハッシュ開始;aキー==キー) return pデータ;
ハッシュ値=(ハッシュ値+1)%ハッシュ要素数_;
if(ハッシュ値==開始ハッシュ値) return null;
}
}
private static void データ配列から重複有りハッシュ配列作成(int データ開始,int データ終了) {
for(int a=データ開始;aキー==キー) {
if(Interlocked.CompareExchange(ref pハッシュ配列[ハッシュ値],unchecked((Int64)0xFFFFFFFFFFFFFFFF),pハッシュ)==pハッシュ) {
return p構造体;
}
}
}
ハッシュ値=(ハッシュ値+1)%ハッシュ要素数_;
if(ハッシュ値==開始ハッシュ値) return null;
}
}
}
unsafe class A {
const int データ数 =1000000;
const int ハッシュ要素数=1100000;
static 構造体[] データ配列=new 構造体[データ数];
static Int64[] ハッシュ配列=new Int64[ハッシュ要素数];
delegate void 探索delegate(int 開始,int 終了);
static void 重複無し探索(int 開始,int 終了) {
for(int a=開始;aキー==キー);
Debug.Assert(pデータ->データ==pデータ->キー*2);
}
}
static void 重複有り探索(int 開始,int 終了) {
for(int a=開始;aキー==キー);
Debug.Assert(pデータ->データ==pデータ->キー*2);
}
}
static void Main() {
TimeSpan 重複無し初期化時間,重複無し検索時間;
{
for(int a=0;a<データ数;a++) {
データ配列[a].キー=(Double)a;
データ配列[a].データ=データ配列[a].キー*2;
}
fixed(構造体* pデータ配列_=&データ配列[0]) {
fixed(Int64* pハッシュ配列_=&ハッシュ配列[0]) {
DateTime t1=DateTime.Now;
並列ハッシュ.データ配列から重複無しハッシュ配列作成(pハッシュ配列_,ハッシュ要素数,pデータ配列_,データ数);
重複無し初期化時間=DateTime.Now.Subtract(t1);
DateTime t2=DateTime.Now;
探索delegate T=new 探索delegate(重複無し探索);
IAsyncResult[] IAsyncResults=new IAsyncResult[並列ハッシュ.CPU数-1];
for(int a=0;a<並列ハッシュ.CPU数-1;a++) {
IAsyncResults[a]=T.BeginInvoke(データ数*a/並列ハッシュ.CPU数,データ数*(a+1)/並列ハッシュ.CPU数,null,null);
}
重複無し探索(データ数*(並列ハッシュ.CPU数-1)/並列ハッシュ.CPU数,データ数*並列ハッシュ.CPU数/並列ハッシュ.CPU数);
for(int a=0;a<並列ハッシュ.CPU数-1;a++) {
T.EndInvoke(IAsyncResults[a]);
}
重複無し検索時間=DateTime.Now.Subtract(t2);
}
}
}
TimeSpan 重複有り初期化時間,重複有り検索時間;
{
for(int a=0;a<データ数;a++) {
データ配列[a].キー=(Double)a/4;
データ配列[a].データ=データ配列[a].キー*2;
}
fixed(構造体* pデータ配列_=&データ配列[0]) {
fixed(Int64* pハッシュ配列_=&ハッシュ配列[0]) {
DateTime t1=DateTime.Now;
並列ハッシュ.データ配列から重複有りハッシュ配列作成(pハッシュ配列_,ハッシュ要素数,pデータ配列_,データ数);
重複有り初期化時間=DateTime.Now.Subtract(t1);
DateTime t2=DateTime.Now;
探索delegate T=new 探索delegate(重複有り探索);
IAsyncResult[] IAsyncResults=new IAsyncResult[並列ハッシュ.CPU数-1];
for(int a=0;a<並列ハッシュ.CPU数-1;a++) {
IAsyncResults[a]=T.BeginInvoke(データ数*a/並列ハッシュ.CPU数,データ数*(a+1)/並列ハッシュ.CPU数,null,null);
}
重複有り探索(データ数*(並列ハッシュ.CPU数-1)/並列ハッシュ.CPU数,データ数*並列ハッシュ.CPU数/並列ハッシュ.CPU数);
for(int a=0;a<並列ハッシュ.CPU数-1;a++) {
T.EndInvoke(IAsyncResults[a]);
}
重複有り検索時間=DateTime.Now.Subtract(t2);
}
}
}
}
}

Posted by: | 2007.11.04 at 11:34 PM

あれー中括弧とかが消えてしまう…。

Posted by: | 2007.11.05 at 12:16 PM

うむむ。ここのコメントにソースは書きづらいですねぇ。まぁ気持ちが分かるので大丈夫ですよ。
見慣れない予約語がいろいろある・・

Posted by: oka | 2007.11.05 at 11:05 PM

Great information. Lucky me I discovered your site by chance (stumbleupon). I've bookmarked it for later!

Posted by: Mamie | 2014.05.02 at 04:52 AM

The only thing you have to do is to utilize this laser leveler tool which will project a visible laser-cross on the walls by simultaneously, displaying accurate horizontal and vertical lines within seconds. Building a shed on your own is beneficial in many ways. Pointing to his disciples, he said, 'Here are my mother and my brothers.

Posted by: Julius | 2014.05.26 at 09:02 PM

What you may not know is the amount of education and strict standards carpenters must qualify for to be considered as professionals, especially in Canada. Being able to work with a wide range of clamps at your fingertips is not hard to do because they are low-priced and easily available. Reaching the perfect horizontal or vertical line is not that easy with the traditional accessories such as bubble leveler.

Posted by: Parthenia | 2014.05.27 at 09:03 AM

The only thing you have to do is to utilize this laser leveler tool which will project a visible laser-cross on the walls by simultaneously, displaying accurate horizontal and vertical lines within seconds. Being able to work with a wide range of clamps at your fingertips is not hard to do because they are low-priced and easily available. Pointing to his disciples, he said, 'Here are my mother and my brothers.

Posted by: Jon Fahy Carpentry | 2014.05.27 at 09:42 PM

After I initially left a comment I seem to have clicked on the -Notify me when new comments are added- checkbox and from now on every time a comment is added I recieve 4 emails with the exact same comment. Is there a way you can remove me from that service? Thanks!

Posted by: law | 2014.07.17 at 12:55 PM

Why viewers still make use of to read news papers when in this technological world all is existing on web?

Posted by: lasmission.com | 2014.07.20 at 06:18 AM

Hi there, everything is going perfectly here and ofcourse every one is sharing information, that's truly excellent, keep up writing.

Posted by: Dominic | 2014.08.02 at 01:12 PM

I get pleasure from, cause I found just what I was taking a look for. You've ended my 4 day long hunt! God Bless you man. Have a great day. Bye

Posted by: germanwisp.de | 2014.09.06 at 12:10 AM

I know this web page presents quality dependeent posts and additional stuff, is there any other site which provides these information in quality?

Posted by: humdrumnerve3055.snack.ws | 2014.09.10 at 04:36 AM

Hello, i read your blog from time to time and i own a similar one and i was just wondering if you get a lot of spam remarks? If so how do you prevent it, any plugin or anything you can advise? I get so much lately it's driving me crazy so any support is very much appreciated.

Posted by: Ara | 2014.09.12 at 07:22 PM

Oh my goodness! Incredible article dude! Thank you so much, However I am going through difficulties with your RSS. I don't understand the reason why I cannot join it. Is there anybody else having the same RSS issues? Anybody who knows the solution can you kindly respond? Thanx!!

Posted by: Ignacio | 2014.09.13 at 03:45 PM

Hello, after reading this remarkable piece of writing i am as well cheerful to share my experience here with mates.

Posted by: schaumburg divorce attorneys | 2014.09.16 at 04:21 PM

Wow, fantastic blog structure! How lengthy have you been blogging for? you make running a blog look easy. The entire look of your site is excellent, let alone the content material!

Posted by: Barbara | 2014.09.18 at 02:10 AM

Wow, that's what I was exploring for, what a stuff! present here at this webpage, thanks admin of this web page.

Posted by: Magaret | 2014.09.18 at 08:58 AM

Hi there colleagues, how is all, and what you would like to say concerning this article, in my view its genuinely remarkable designed for me.

Posted by: personal injury form | 2014.09.20 at 10:21 PM

Greetings from Los angeles! I'm bored at work so I decided to browse your blog on my iphone during lunch break. I really like the info you provide here and can't wait to take a look when I get home. I'm shocked at how fast your blog loaded on my mobile .. I'm not even using WIFI, just 3G .. Anyhow, great site!

Posted by: personal injury update 2013 | 2014.09.21 at 05:30 AM

Undeniably consider that that you stated. Your favourite justification seemed to be at the web the easiest factor to take into accout of. I say to you, I definitely get irked while other people think about worries that they plainly do not understand about. You controlled to hit the nail upon the highest as well as defined out the whole thing without having side-effects , other folks can take a signal. Will probably be again to get more. Thank you

Posted by: shopping websites | 2014.09.25 at 04:12 PM

Do you have a spam problem on this blog; I also am a blogger, and I was curious about your situation; we have created some nice methods and we are looking to swap techniques with other folks, be sure to shoot me an e-mail if interested.

Posted by: www.plurk.com | 2014.09.25 at 11:16 PM

Thanks for sharing your thoughts on personal injury york. Regards

Posted by: rstat.us | 2014.09.28 at 09:05 PM

I know this if off topic but I'm looking into starting my own blog and was wondering what all is required to get setup? I'm assuming having a blog like yours would cost a pretty penny? I'm not very internet savvy so I'm not 100% certain. Any tips or advice would be greatly appreciated. Kudos

Posted by: personal injury insurance coverage | 2014.09.30 at 02:17 PM

Thanks in favor of sharing such a pleasant thinking, paragraph is good, thats why i have read it fully

Posted by: Christin | 2014.10.11 at 11:54 PM

Superb, what a web site it is! This blog provides useful facts to us, keep it up.

Posted by: Frankie | 2014.10.13 at 03:34 AM

Hmm is anyone else experiencing problems with the images on this blog loading? I'm trying to find out if its a problem on my end or if it's the blog. Any feed-back would be greatly appreciated.

Posted by: accident at work | 2014.10.14 at 11:09 PM

Hi, this weekend is fastidious for me, for the reason that this time i am reading this fantastic educational paragraph here at my residence.

Posted by: room air Purifiers | 2014.11.08 at 11:43 PM

Can I simply just say what a comfort to find someone that really knows what they are discussing on the net. You definitely understand how to bring a problem to light and make it important. More people really need to check this out and understand this side of the story. It's surprising you are not more popular because you certainly possess the gift.

Posted by: http://www.youtube.com/watch?v=ZOZ1PcD-Yrk | 2014.11.15 at 05:59 AM

I am really loving the theme/design of your site. Do you ever run into any browser compatibility problems? A small number of my blog visitors have complained about my site not operating correctly in Explorer but looks great in Chrome. Do you have any tips to help fix this issue?

Posted by: chapter 7 un charter | 2014.11.16 at 02:20 PM

After looking over a number of the articles on your web site, I really like your technique of writing a blog. I saved as a favorite it to my bookmark webpage list and will be checking back soon. Please check out my website as well and tell me your opinion.

Posted by: Bankruptcy Duluth MN | 2014.11.21 at 09:33 PM

Thanks for the good writeup. It if truth be told used to be a leisure account it. Look complicated to more delivered agreeable from you! However, how could we be in contact?

Posted by: Deidre | 2014.11.22 at 09:16 PM

This website was... how do I say it? Relevant!! Finally I have found something which helped me. Thanks!

Posted by: free bankruptcy attorneys | 2014.11.23 at 08:40 AM

Hi there! Someone in my Myspace group shared this site with us so I came to check it out. I'm definitely enjoying the information. I'm book-marking and will be tweeting this to my followers! Exceptional blog and fantastic design and style.

Posted by: bankruptcy attorney | 2014.11.29 at 11:01 AM

Thank you for any other fantastic article. Where else could anyone get that type of info in such an ideal approach of writing? I've a presentation next week, and I'm on the search for such info.

Posted by: attorney today | 2014.11.30 at 11:10 AM

Hi there, i read your blog from time to time and i own a similar one and i was just wondering if you get a lot of spam feedback? If so how do you stop it, any plugin or anything you can advise? I get so much lately it's driving me crazy so any help is very much appreciated.

Posted by: Edgardo | 2014.12.01 at 01:40 PM

Hello, yup this piece of writing is in fact fastidious and I have learned lot of things from it about blogging. thanks.

Posted by: credit rebuilding | 2014.12.01 at 01:55 PM

Pretty component to content. I simply stumbled upon your blog and in accession capital to assert that I acquire actually loved account your blog posts. Anyway I'll be subscribing for your augment or even I success you get admission to constantly fast.

Posted by: issuu.com | 2014.12.01 at 06:41 PM

WOW just what I was looking for. Came here by searching for credit cards

Posted by: Armando | 2014.12.01 at 10:43 PM

Thank you for sharing your thoughts. I really appreciate your efforts and I will be waiting for your next post thanks once again.

Posted by: habitualincubus14.blogs.experienceproject.com | 2014.12.03 at 05:54 AM

I am in fact thankful to the owner of this web page who has shared this fantastic post at here.

Posted by: Bessie | 2014.12.03 at 06:25 AM

Good post. I learn something new and challenging on sites I stumbleupon every day. It's always exciting to read content from other authors and practice a little something from their web sites.

Posted by: Gerard | 2014.12.03 at 11:10 AM

Your style is really unique compared to other folks I've read stuff from. Many thanks for posting when you have the opportunity, Guess I'll just bookmark this blog.

Posted by: Brandie | 2014.12.04 at 02:39 AM

Thank you for some other informative web site. The place else may just I am getting that type of info written in such an ideal way? I have a undertaking that I am simply now running on, and I have been at the look out for such info.

Posted by: Carroll | 2014.12.04 at 04:16 AM

Thank you for sharing your info. I truly appreciate your efforts and I will be waiting for your further write ups thanks once again.

Posted by: Elton | 2014.12.04 at 07:06 PM

Hurrah! At last I got a blog from where I be capable of in fact get useful information regarding my study and knowledge.

Posted by: unsecured debts- | 2014.12.05 at 12:35 AM

Hey very nice blog!

Posted by: Mellissa | 2014.12.20 at 08:27 AM

This is very interesting, You are a very skilled blogger. I've joined your feed and look forward to seeking more of your wonderful post. Also, I've shared your site in my social networks!

Posted by: http://www.ralph-engelmann.com/?p=17831 | 2015.01.15 at 12:58 AM

We can hack someone's Fb private life for you within moments; your target person will not understand that his GMail account has been revealed.

Posted by: hack a facebook account 2014 | 2015.01.17 at 05:31 PM

I was able to find good info from your articles.

Posted by: http://www.polihasnur.ac.id/halkomentar-134-hasil-seleksi-test-tertulis-penerimaan-dosen-dan-teknisi | 2015.01.19 at 06:33 AM

Hello to all, how is everything, I think every one is getting more from this web site, and your views are good in favor of new people.

Posted by: match.com | 2015.04.05 at 09:15 PM

Sharing and Communication: The 4. Also you can unlock global high scores and hard modes in the shop. The chapters have been organized so that the difficulty increases progressively with each chapter.

Posted by: Kaylee | 2015.06.30 at 11:28 PM

We absolutely love your blog and find almost all of your post's to be exactly what I'm looking for. can you offer guest writers to write content in your case? I wouldn't mind producing a post or elaborating on many of the subjects you write related to here. Again, awesome web log!

Posted by: foot condom | 2015.09.09 at 04:13 AM

Post a comment



(Not displayed with comment.)




TrackBack

TrackBack URL for this entry:
http://app.cocolog-nifty.com/t/trackback/3041/16902200

Listed below are links to weblogs that reference Bep: 最小完全ハッシュ関数を用いた連想配列:

« 統計科学 | Main | tx bepの内部技術の発表 »