7th European BSD Conference: Oct 18-19 2008, Strasbourg, France

Dynamic memory allocation for dirhash in UFS2

Nick Barkas

Abstract
Hello My name is Nick Barkas. I'm a master's student studying scientific computing at Kungliga Tekniska högskolan (KTH) in Stockholm, Sweden. I have just begun work on a Google Summer of Code project with FreeBSD: Dynamic memory allocation for dirhash in UFS2 . I would like to present my results from this project at EuroBSDCon this year. This project is very much a work in progress now so it is a bit difficult to summarize what I would ultimately present. I will try to describe an outline, though. First I will give background information on dirhash: an explanation of the directory data structure in UFS2, how directory lookups in this structure necessitate a linear search, and how dirhash speeds these lookups up without having to change anything about the directory data structure. Next I will explain the current limitation that dirhash's maximum memory use must be manually specified by administrators, or left at a small conservative default of 2MB. I will explain some different methods I will have explored to try and make this maximum memory limit dynamically increase and decrease as the system has more or less free memory, and which method I will have ultimately settled on and implemented. Then I'll present some test results of performance of operations on very large directories with and without dynamic memory allocation enabled for dirhash. Next I will talk about how speed gains from dirhash are limited by the fact that the hash tables exist only in memory and must be recreated after each system boot, as big directories are scanned for the first time, or even have to be recreated for a directory that has not been scanned in some time if its dirhash has been discarded to free memory. These problems can be eliminated by using an on-disk index for directory entries. I will talk about some of the challenges of implementing on-disk indexing, such as remaining backwards compatible with older versions of UFS2 and interoperating properly with softupdates. Then, if my SoC project has permitted me time to work on this aspect of it, I will explain some possible methods for adding directory indexing to UFS2 that meets these challenges, and which of those ideas I will have implemented. Finally I will present results of some benchmarks on this filesystem with indices, and compare to performance with dirhash, and with no indices or dirhashes.
Keywords
dirhash, ufs2, filesystems, performance tuning