Essbase Optimization (BSO)
(Courtesy of public blogger Aditya Saini - posted July 11, 2012)
- Block size
- Block density
- Cache settings
- Outline dimension order
- Data Compression
- Optimizing data loads
In the Essbase Admin Guide they say blocks should be between 1-100Kb in size, but nowadays with more memory on servers this can be larger. My experience is to make blocks below 200-250 Kb but not less the 1-2Kb, but this is all dependent on the actual data density in the cube.
The 'density' of a block is the ratio of cells that actually have data, to the number of cells in the block.
This gives an indication of the average percentage of each block which contains data. In general data is sparse, therefore a value over 1% is actually quite good. If your block density is over 5%,then your dense/sparse setting is generally spot-on. A large block with high density is OK, but large blocks with very low density (<1%) not.
Never ever leave a cube with the default cache settings. This is never enough (except for a very basic cube).
Essbase uses following memory caches to coordinate memory usage.
1. Index Cache
2. Data File Cache
3. Data Cache
4. Calculator Cache
The index cache is a buffer in memory that holds index pages. How many index pages are in memory at one time depends upon the amount of memory allocated to the cache.
Min -1024 KB (1048576 bytes)
Default - Buffered I/O : 1024 KB (1048576 bytes)
Opt -Combined size of all essn.ind files, if possible; as large as possible otherwise. Do not set this cache size higher than the total index size, as no performance improvement results.
Data file cache:
The data file cache is a buffer in memory that holds compressed data files (.pag files). Essbase allocates memory to the data file cache during data load, calculation, and retrieval operations, as needed. The data file cache is used only when direct I/O is in effect.
Min - 10240 KB(10485760 bytes)
Default -Direct I/O: 32768 KB(33554432 bytes)
Opt -Combined size of all essn.pag files, if possible; otherwise as large as possible. This cache setting not used if Essbase is set to use buffered I/O.
The data cache is a buffer in memory that holds uncompressed data blocks. Essbase allocates memory to the data cache during data load, calculation, and retrieval operations, as needed.
Min - 3072 KB (3145728 bytes)
Default - 3072 KB (3145728 bytes)
Opt -0.125 * the value of data file cache size. Or 3 times of index cache.
The calculator cache is a buffer in memory that Essbase uses to create and track data blocks during calculation operations.
Min - 4 bytes
Max: 200,000,000 bytes
Default - 200,000 bytes
Opt -The best size for the calculator cache depends on the number and density of the sparse dimensions in your outline. The optimum size of the calculator cache depends on the amount of memory the system has available.
Rule of thumb here is to see if you can get the entire index file into the cache, and make the data cache 3 times index cache, or at least some significant size. Also check your cube statistics to see the hit ratio on index and data cache, this gives an indication what % of time the data being searched is found in memory. For index cache this should be as close to 1 as possible, for data cache as high as possible
Outline Dimension Order
Remember the hourglass principle. This means order the dimensions in your outline as follows –first put the largest (in number of members) dense dimension, then the next largest dense dimension, and continue until the smallest dense dimension. Now put the smallest sparse dimension, then next smallest, and continue until the largest sparse dimension. Now put all non-aggregating sparse dimensions (i.e. year, version, scenario etc.) beneath the largest sparse dimension. Because of the way the Essbase calculator works, this arrangement optimizes number of passes through a cube.
Essbase allows you to choose whether data blocks that are stored on disk are compressed, as well as which compression scheme to use. When data compression is enabled, Essbase compresses data blocks when it writes them out to disk.
Types of data compression:
- Bitmap compression ( default).
- Run-length encoding (RLE).
- zlib compression.
- Index Value Pair compression.
- No compression.
Essbase stores only non-missing values and uses a bitmapping scheme. A bitmap uses one bit for each cell in the data block, whether the cell value is missing or non-missing. When a data block is not compressed, Essbase uses 8 bytes to store every non-missing cell. In most cases, bitmap compression conserves disk space more efficiently. However, much depends on the configuration of the data.
Run-length encoding (RLE):
Essbase compresses repetitive, consecutive values --any value that repeats three or more times consecutively, including zeros and #MISSING values. Each data value that is repeated three or more times uses 8 bytes plus a 16 byte repetition factor.
Essbase builds a data dictionary based on the actual data being compressed. This method is used in packages like PNG, Zip, and gzip. Generally, the more dense or heterogeneous the data is, the better zlib will compress it in comparison to bitmap or RLE compression. If your essbase is 90% dense, you may use ZLIB for the compression method
Index Value Pair compression:
"Index Value Pair" compression is selected automatically by the Essbase system. Essbase applies this compression if the block density is less than 3%. Index Value Pair addresses compression on databases with larger block sizes, where the blocks are highly sparse. zlib does not use this.
Optimizing Data Loads
The best technique to make large data loads faster is to have the optimal order of dimensions in source file, and to sort this optimally, order the fields in your source file (or SQL statement) by having hourglass dimension order, you data file should have dimensions listed from the bottom dimension upwards. Your dense dimensions should always be last, and if you have multiple data columns these should be dense dimension members. This will cause blocks to be created and filled with data in sequence, making the data load faster and the cube less fragmented.
The Fragmentation is likely to occur with the following:
• Read/write databases that users are constantly updating with data
• Databases that execute calculations around the clock
• Databases that frequently update and recalculate dense members
• Data loads that are poorly designed
• Databases that contain a significant number of Dynamic Calc and Store members.
One can measure fragmentation using the average clustering ratio.
The average clustering ratio database statistic indicates the fragmentation level of the data (.pag) files. The maximum value, 1, indicates no fragmentation.
To remove fragmentation, perform an export of the database, delete all data in the database with CLEARDATA, and reload the export file.
To remove fragmentation, force a dense restructure of the database.
Step 1: Minimize the number of dimensions.
Step 2: Evaluate Dense/Sparse Settings (Optimize the block size.)
Step 3: Order The Outline in Hourly Glass Model
Step 4: Optimize Cache Settings
Step 5: Optimize Data Loads