====================================================================== PowerBASIC Gazette - Electronic Edition Volume 1 - Issue 4 PowerBASIC, Inc. (800) 780-7707 Sales 1978 Tamiami Trail S. #200 (941) 408-8700 Voice Venice, FL 34293-5006 (941) 408-8820 Fax Visit us on the World Wide Web at http://www.powerbasic.com Email Sales: sales@powerbasic.com This newsletter is only sent to email addresses in our subscription list. If you have received this newsletter by mistake or no longer wish to receive it, please send a simple unsubscribe request to support@powerbasic.com with your name and zip/postal code. This newsletter is best viewed with a fixed-width font. ====================================================================== What's Inside this Issue? - Applications Need Some Juice? Tap PowerTree! - New telephone area code for PowerBASIC - Subscription information for the PowerBASIC Gazette ====================================================================== Applications Need Some Juice? Tap PowerTree! --------------------------------------------- by Bruce Tonkin and Robert Zale - PowerBASIC Inc. Almost every useful program needs a data file of some kind. In fact, in many cases, the data is the entire reason for the existence of the program! But a key requirement is the need to find the specific data you need, quickly and accurately. You may need ten million customer names in alphabetical order... many mailers in zip code sequence... perhaps help to find an employee with a name that sounds like Smith (or could it be Schmidt, or Smit, or Smythe)... or even the urgent need to catalog and count all the words in the latest report from the Independent Prosecutor? All of these common problems can quickly benefit by applying the concepts of a good BTree Manager. Just in case you're a tiny bit "rusty" with all of these terms, let's briefly describe them: A BTree Manager is a general purpose library of functions, used to store and retrieve data quickly and efficiently. It does this by creating an index (a map, so to speak) of the important elements of the data, and information on where the data may be found. The index and the data may be intertwined in a single file, or they may be logically separated. There may be one index per data file, or many of them. Perhaps one index to yet another index. You might even create an index with no data at all! The real point here is fundamental... with a truly versatile BTree Manager, the sky is the limit. Generally speaking, each entry in the index contains a "Key" and a "Pointer". The Key is usually a string of something less than 100 bytes, and the Pointer is typically a 32-bit long integer. The secret is to use this simple information to manage fast access to your data in just the manner you need it. Not in a fashion dictated by the authors, but in just the manner and form you desire, to optimize the performance of your application code. Let's take a simple mailing list of ten million customers. You might wish to access them by name, the sound of the name (more about soundex codes later), or the zip code. So just create three indexes. For every customer, add an entry to each index with keys like "KALE", "K400", and "95066". But all would use the same pointer, the record number of Mr. Kale's main data record. Later, by using the other functions, you can retrieve your data sequentially, by exact match, by approximate match, or even by a sound-alike algorithm. Just in case you hadn't already guessed it, PowerBASIC Inc. recently released PowerTREE... a very efficient BTree Manager, with versions for DOS or Windows (both 16-bit and 32-bit). When the PowerTREE project was first conceived, the planners laid out a number of fairly lofty goals: It must be exceedingly fast, affordable, and use just a very small memory footprint. It must be based upon the prized B+Tree algorithm, allow duplicate keys, and must support networked, even multi-threaded applications. It must offer very large key size, and work well with virtually any format of data file. We think we've succeeded on every count. Read on a bit, and see if you agree... Binary Trees, B-Trees, and B+Trees Most every indexing scheme is based upon a tree structure, the first and oldest being the Binary Tree. In this scenario, each block of the index contains one entry, along with information about the next greater and next lesser blocks. Searching or traversing a binary tree is, in effect, a binary search -- much faster than scanning the file sequentially. You start at the mid-point, and continually cut the remainder of the file in half until you reach your target. However, with this small block size, it's clear that one would still do a great deal of "churning" from block to block to block in order to find a desired record. Binary Trees quickly gave way to the classic B-Tree algorithm, since larger block sizes typically yield much better efficiency. In a classical B-tree, multiple keys and pointers are stored in blocks. To see how it works, consider a block size of 64 bytes. In it, we want to store a key of 14 bytes and a long integer pointer (we'll call the pointer the record number, but it can be anything... even a code number or a byte position in a text file). So, in this case, each key-plus-pointer needs 18 bytes of storage, meaning we can store just three entries per block. That's a total of 54 bytes with 10 bytes left over. Not enough for another complete entry, so the last 10 bytes represent just so much wasted space. When you add a new entry to a B-tree, the indexing package must search the blocks to find the appropriate storage position. The blocks are maintained in alphabetical order, and a binary-search is used to find the one spot where the new entry should be inserted. If a block becomes full (that is, no room to insert another entry), it must be "split" to create additional storage space. In the first algorithm, a block was split by simply shifting all higher blocks up one position in memory, creating a "hole" for the new block. In essence, the old block became two blocks, and the new entry could be inserted in the appropriate position. Not terribly efficient, but fairly simple to implement. It didn't take long for B-trees to be enhanced. For example, instead of moving all the blocks up during a split, one could just keep track of both the logical position and physical position of each of the blocks. Then, when a split was needed, the new block could be added at the end of the index. Much faster than physically moving all those blocks, but it created a whole new issue: How could the btree manager keep a good track on the block sequence? That was essential in order to implement a binary search. Most B-Tree implementations relied upon a block sequence array. At start-up, the entire index was scanned, to build an in-memory model of the logical sequence. The array was updated each time a block was added or deleted. Fairly fast, but there are obvious problems in a multi-user or multi-threaded environment. While they can certainly be solved, another glaring concern is memory usage: since a separate array is needed for each key, just how much memory can we afford to dedicate to these sequence arrays? There must be a finite limit, or we'd have nothing left for the application! It's for this very reason that early btree managers mandated fixed size, limited pointer size, or even disallowed multiple keys entirely. BTree Managers with small block sizes must also limit the maximum key length. PowerTree uses a block size of 4096 bytes and a key size of up to 255 bytes... Not likely anyone will push that limit too soon. Over the years, the B-Tree algorithm was enhanced by many, evolving into the state-of-the-art B+Tree. Block pointers were moved to disk, to simplifyy multi-user access. Then, block pointers were combined into clusters, excluded from ordinary searches, to vastly improve speed through buffering. Backward pointers were added for quicker "Search Previous" duties. If a block was emptied and deleted, it could be removed from the block map record without shuffling others. If an entire block map record was emptied, nearby entries could be borrowed. With this scheme, you could delete 99% of the entries, and still avoid typical index maintenance. Of course, a complete re-index would reduce the size of the index file, but the elegance lies in the fact that it's never a requirement. So just what is the difference between B-Trees and B+Trees? Actually, that's a very difficult question -- one which seems to have very many answers, depending upon just whom you ask. The differences have been evolutionary... many programmers and theorists have contributed over a long period of time. At PowerBASIC, we use the term B+Tree to just signify the latest version of the algorithm -- one which embodies all of the features we've already described. Since others may rightfully disagree, let's just allow PowerTREE to speak for us, in terms of pure performance. From a scholarly perspective, we're prepared to consider a number alternative definitions. Tapping PowerTREE for Yourself... So enough with the theory -- how about some practical considerations? PowerTREE is exceedingly affordable. The DOS version sells for just $89.00, and can be linked into any PowerBASIC 3.5 executable. The Windows version includes both 16-bit and 32-bit DLLs in one single package priced at $129.00. And the beauty is that you only buy them once -- No royalties nor run-time fees of any sort. Of course, you can't distribute source. You can't distribute the documentation. But you can release all PowerTREE object code to your users, without cost, when it is included as part of another larger software package. [Since PowerTree 1.1, the DOS and Windows versions are included in one single package for just $99.00 complete!] PowerTREE is very, very fast. As much as 31 times faster than some of the most well-known products in the marketplace. Just in case that didn't sink in... up to 3,100% faster than some major players in the database market. In the near future, we'll be posting some specific benchmarks on our web site at www.powerbasic.com. You may wish to watch for their arrival. We'll compare the specific performance of PowerTREE with the well-known packages like Access, Clipper, FoxPro, dBase, VB/Isam, Btrieve, and others. We'll also document some lesser known products like CodeBase, and even Bullit. But frankly, that's only a formality. In all of our comparative testing thus far, we've found nothing which outperforms PowerTREE. Not a one! PowerTREE is flexible. There are virtually no limits on the number of concurrent indexes, nor are there any restrictions on the format of your data. PowerTREE manages your indexes... keys up to 4082 bytes and pointers of +-2,147,483,648 and keeps them separate from your data file, which you maintain, to ensure integrity. Simply put, PowerTREE doesn't sabotage your data format -- leave it any way you like: random files with fixed-length records or variable-length records, record numbers indexed to zero or one, dBase DBF files, sequential files, binary files, or even something special you cooked up just for this occasion! Your data, and how it's structured, remains your business, and your personal property. One other thing: Duplicate keys are fully supported with reproducible results every time. PowerTREE is universal. Use PowerTREE with PowerBASIC, PB/DLL, PB/CC, or practically any Windows language (C/C++, Delphi, Visual Basic, or others). PowerTREE uses the industry-standard DLL calling conventions for universal compatibility. The Windows version includes two sets of code, to immediately support both 16-bit and 32-bit versions of the operating system (Win3/Win95/Win98/WinNT). Best of all, every version of PowerTREE uses the identical file format -- index files can readily be shared between DOS, Win16, and Win32 applications. PowerTREE is very easy to use. Only 14 functions to learn, not hundreds like some of the others, and two of those 14 are for Soundex translations. They're obvious, intuitive, and none require more than two parameters. Call one function to create an index, another to add a key entry, a third to delete an item. Search for an exact match, or just the closest match. Search forward or reverse in sequence. It's just that simple and straightforward. How about networks? A piece of cake. Set just one variable, and PowerTREE handles all the multi-user considerations for you! That's right -- regardless of whether your code runs multi-tasking, multi-user, or multi-threaded, PowerTREE does it all. Automatically. As mentioned earlier, Soundex support is included in two of the PowerTREE functions. Basically, these functions translate any word into a 4-character string based upon the sound of the spoken word. An ingenious idea which actually dates back to the last century. Typically, it's used with surnames, to help locate minor misspellings or mispronunciation -- Smith, Smit, and Smythe would all evaluate to the same Soundex code. One last point... As with all of our products, PowerTREE is a very active participant in the PowerBASIC War On Bloatware! We were quite careful about that particular issue. Depending upon the version, the total footprint runs from 50K to 60K. Not too bad for the fastest BTree Manager in town! We hope we've been able to provide some insight into the structure of B+Trees, as well as our own implementation of PowerTREE. If you find you have a need for indexed file management, you simply can't go wrong with PowerTREE. ====================================================================== Order online at https://www.powerbasic.com/shop/ See more at http://www.powerbasic.com/products/powertree/ ====================================================================== Is your PowerBASIC Gazette Electronic Edition subscription coming to you at home or work? If you don't want to miss a single issue, why not subscribe from both email addresses? Send your subscription request to email@powerbasic.com and please include your name and all email addresses you'd like to add as well as your Zip or Postal Code. If you know someone else who would enjoy this newsletter please forward a copy to them so they can subscribe. ====================================================================== All contents Copyright (c) 1998 PowerBASIC, Inc. All Rights Reserved. PowerBASIC is a registered trademark of PowerBASIC, Inc. PB/CC, PB/DLL, and PowerTREE are trademarks of PowerBASIC, Inc. All other brand names are trademarks or registered trademarks of their respective owners. ======================================================================