Tar PM Optimization Tar PM Optimization Overview / Adobe Experience Manager / Adobe Experience Manager 6.0 / Deploying and Maintaining / Upgrading to AEM 6.0 / Persistence Managers and Other Storage Elements / Tar PM optimization removes obsolete data from the CQ repository. You can optimize the optimization process by observing the process behavior and adjusting performance properties. Data stored in CQ is composed of blobs of content of varying sizes, such as a 10MB video or a short string. CQ divides the content into large and small objects using an arbitrary size threshold of 4K by default. Objects larger than 4K go in the "data store" and objects below 4K go into the repository, which uses the tar-persistence manager (tar-pm). Tar-pm uses an append-only mechanism to store data. If an object is changed, a new copy of the object is appended to the repository file and the old object becomes orphaned. The old object still uses space, but is no longer used. When more changes occur, more objects are appended and more old data becomes obsolete. The presence of old data can become a storage overhead and a process of tar-pm optimization is used to clean up all orphaned content and create a repository that contains only content that is currently in use. The TAR-pm places content in a series of files, each of an approximate size of 256 MB. The repository can contain many of these files, each numbered chronologically. When a tar-optimization process runs, it takes the oldest file first and transfers all of the active content into a new file. The un-used, obsolete content remains in the old file. After processing an old file and all of the active content is written to a new output file, the old file is removed. Because there may be dozens or more files and tar-optimization is normally scheduled to run in a fixed time window, it is possible that on a given run, the process does not optimize all of the input files. Because the optimizer process always begins with the first file, the process generally requires the same amount of time each time it runs. This diagram shows how TAR files are processed and reprocessed in successive runs: It is not necessarily a problem if the tar optimizer does not complete processing of all of the files in the CQ repository during a given run. For example, if the repository contains 24 files, but TAR optimization processed only 12 files in an overnight run, you can assume that the next run resumes where the previous © 2012 Adobe Systems Incorporated. All rights reserved. Page 1 Created on 2014-12-16 Tar PM Optimization run stopped, processing the remaining 12 files. Thus the effective throughput would be 12 files per day, and with 24 files the average re-optimization interval would be two days. MEASURING OPTIMIZATION PROGRESS To estimate the average time required to optimize all TAR files, count the TAR files in the repository and examine the error.log to see how many files are processed each night. TAR PM optimization processes files that are located in three areas of the repository: • • • /crx-quickstart/repository/workspaces/crx.default /crx-quickstart/repository/tarJournal /crx-quickstart/repository/version Typically, the workspaces TAR files require the most time for TAR optimization. From the log, you can see which files are processed and the time required to optimize them. The following example data illustrates the information that can be extracted from the log regarding TAR PM optimization activity: Start End TAR file Time (s) 02:00:03 02:11:17 workspaces/ data_01724.tar 674.2 02:11:17 02:22:29 workspaces/ data_01725.tar 672.3 02:22:29 02:33:25 workspaces/ data_01726.tar 655.5 02:33:25 02:44:18 workspaces/ data_01727.tar 653.4 02:44:18 02:55:46 workspaces/ data_01728.tar 687.9 02:55:46 03:06:46 workspaces/ data_01729.tar 659.7 03:06:46 03:17:30 workspaces/ data_01730.tar 644.4 03:17:30 03:27:53 workspaces/ data_01731.tar 622.5 03:27:53 03:38:18 workspaces/ data_01732.tar 625.3 03:38:18 03:49:32 workspaces/ data_01733.tar 673.9 03:49:32 03:59:13 workspaces/ data_01734.tar 580.6 03:59:13 04:08:54 workspaces/ data_01735.tar 581.1 © 2012 Adobe Systems Incorporated. All rights reserved. Page 2 Created on 2014-12-16 Tar PM Optimization 04:08:54 04:17:36 workspaces/ data_01736.tar 522.2 04:17:36 04:28:30 workspaces/ data_01737.tar 654.2 04:28:30 04:39:21 workspaces/ data_01738.tar 650.9 04:39:21 04:48:30 workspaces/ data_01739.tar 549.1 04:48:30 04:59:21 workspaces/ data_01740.tar 651.5 04:59:21 05:00:00 workspaces/ data_01741.tar 38.536 In this example, TAR PM optimization processes about 17 files, each requiring about 600 seconds. The last file was abandoned at 0500, and TAR PM optimization did not complete. To assess the remaining files that were not optimized, look in the workspace directory: /crx-quickstart/repository/workspaces/crx.default total 4973816 -rw-r--r-- 1 user1 user1 268438016 Sep 27 14:19 data_01741.tar -rw-r--r-- 1 user1 user1 268438016 Sep 27 14:35 data_01742.tar -rw-r--r-- 1 user1 user1 268437504 Sep 27 14:51 data_01743.tar -rw-r--r-- 1 user1 user1 269306880 Sep 27 15:10 data_01744.tar -rw-r--r-- 1 user1 user1 268588544 Sep 27 15:32 data_01745.tar -rw-r--r-- 1 user1 user1 269104128 Sep 27 15:49 data_01746.tar -rw-r--r-- 1 user1 user1 268961792 Sep 27 16:11 data_01747.tar -rw-r--r-- 1 user1 user1 269267456 Sep 27 16:22 data_01748.tar -rw-r--r-- 1 user1 user1 271622656 Sep 27 16:32 data_01749.tar -rw-r--r-- 1 user1 user1 268437504 Sep 28 02:31 data_01750.tar -rw-r--r-- 1 user1 user1 268438016 Sep 28 03:05 data_01751.tar -rw-r--r-- 1 user1 user1 268438016 Sep 28 03:19 data_01752.tar -rw-r--r-- 1 user1 user1 268438016 Sep 28 03:31 data_01753.tar -rw-r--r-- 1 user1 user1 268438016 Sep 28 03:52 data_01754.tar -rw-r--r-- 1 user1 user1 268438016 Sep 28 04:02 data_01755.tar -rw-r--r-- 1 user1 user1 268438016 Sep 28 04:11 data_01756.tar -rw-r--r-- 1 user1 user1 268437504 Sep 28 04:27 data_01757.tar -rw-r--r-- 1 user1 user1 268438016 Sep 28 04:49 data_01758.tar -rw-r--r-- 1 user1 user1 81231360 Sep 28 10:02 data_01759.tar drwxr-xr-x 12 user1 user1 4096 Sep 28 10:02 index -rw-r--r-- 1 user1 user1 135413760 Sep 28 05:00 index_1_426.tar -rw-r--r-- 1 user1 user1 21861888 Sep 28 05:01 index_391_12.tar -rw-r--r-- 1 user1 user1 0 Sep 27 16:01 locks -rw-r--r-- 1 user1 user1 11436057 Sep 20 15:11 q -rw-r--r-- 1 user1 user1 1872 Sep 27 15:58 workspace.xml The TAR PM processed and deleted files data_01724.tar through data_01740.tar. The files created between 0200 and 0500 are the outputs of the optimization (files data_01750.tar through data_01758.tar). In this case the TAR optimization generated approximately 9 output files after processing 17 input files, so it reduced space requirements by 2:1. When the scheduling constratints stopped the optimization process at 0500, the process abandoned the data_01741.tar file, which became the oldest file in this directory. The files that the TAR optimization did not process are data_01741.tar through data_01749.tar; a total of 9 files. Of the 26 files, it processed 17, or 65%. The next day, the scheduled optimization would complete those 9 files and around 8 more. © 2012 Adobe Systems Incorporated. All rights reserved. Page 3 Created on 2014-12-16 Tar PM Optimization In this case, the average time between optimizations of any file is about a day and a half, which is likely satisfactory. IMPROVING THE PERFORMANCE OF TAR OPTIMIZATION The throughput of optimization I have observed on the snokzlx14 data has been fairly consistent at about 600 seconds per 256MB tar file, or about 0.43MB/sec. This does not represent a high I/O rate and you can readily observe that the CPU is not heavily utilized. In the case of the snokzlx14 data, the time spent in tar optimization is traceable to a throttling mechanism that is designed to curtail the impact of tar optimization on production transactions. CONFIGURING THE TAROPTIMIZATIONDELAY PROPERTY TAR optimization delay is a throttling mechanism that ensures adequate system resources are available for higher-priority production transactions. On systems with limited I/O capacity,TAR optimization can starve other CQ operations for filesystem I/O bandwidth, affecting performance or even stability of the system. 1. 2. 3. Open the CQ Web Console and click the JMX tab. (http://localhost:4502/system/console/jmx) Click the Repository MBean for the com.adobe.granite domain. In the table, click the value of the TarOptimizationDelay attribute, change the value to 0, and click Save. CONFIGURING THE INDEXINMEMORY PROPERTY It is possible that file system I/O capacity is the limiting factor for TAR optimization performance. In this case, you can configure the indexInMemory property to reduce the I/O requirements. The following procedure configures the indexInMemory configuration to ensure enough heap memory is available to handle the index. The heap size is also doubled for future growth. Your system could benefit from this configuration if you observe high CPU usage and high disk usage during Tar PM optimization. If increasing the indexInMemory property has negligible affect, the original configuration is adequate. For more information, see Performance Tuning Tips. 1. Use a text editor to open the crx-quickstart/repository/workspaces/crx.default/workspace.xml and crx-quickstart/repository/repository.xml files. Add the <param name="indexInMemory" value="true"/ > element to the PersistenceManager element, as in the following example: <PersistenceManager class="com.day.crx.persistence.tar.TarPersistenceManager"> <param name="indexInMemory" value="true" /> </PersistenceManager> 2. 3. 4. Calculate the total size of the index*.tar files in the crx-quickstart/repository/workspaces/crx.default directory, in MB. Calculate the total size of the index*.tar files in the crx-quickstart/repository/version directory, in MB. Double the sum of the two totals and add the value to the maximum heap size of the JVM: Increase in heap size = (total from step 2 + total from step 3) x 2 For example, the startup script for a server uses the -Xmx2048m parameter to configure the heap size of the JVM. The server has, as a result of step 4, a total of 1000 MB. Therefore, the heap size is increased by 1000 MB using -Xmx3072m as the JVM parameter. © 2012 Adobe Systems Incorporated. All rights reserved. Page 4 Created on 2014-12-16 Tar PM Optimization TAR PM OPTIMIZATION CASE STUDY In this example case study, iostat is used to monitor CPU and disk usage during the TAR optimization process. The disk is between 6% and 7% busy and the CPU is between 4% and 5% busy. The disks are doing about 100 transfers per second, which is a considerable load but does not approach their capability. In this chart the percentage of disk (red) and CPU (blue) utilization are plotted over time. The green line represents the overall disk write throughput in MB/sec (the righ axis). The orange horizontal dashes represent the processing of TAR files. The length of each line represents the time required for processing, and the vertical position represents the time taken in seconds. The chart shows that the system was idle (0% utilization) before and after TAR optimization occurred. During optimization, the disk throughput is about 20MB/sec, which iostat reports as about 6% of throughput capacity. The TAR optimization rate averages about 24.5 seconds per file, or about 10 MB/sec of TAR file content. Despite eliminating the delay, the results above do not indicate that I/O capacity is a limit on TAR optimization. To validate that system I/O capacity is adequate, TAR optimization performance is again measured, this time in the presence of a large amount of background disk I/O (generated for testing purposes). The generated background load consists of the following activites: • Copy all of the TAR files from author/crx-quickstart/repository/workspaces/crx.default to a temporary directory on the same physical filesystem. • Repeatedly copy each of the files in succession, from the original filename to a temp file using dd "if=$IN of=$OUT bs=512" Note that the block size is relatively small at 1/2 K. While the background load is running, the throughput using iostat and the throughput reported by the dd commands are about 220 MB/sec. TAR optimization is performed while this background I/O load is running. The following chart shows the results using the same layout as above. Note that in this chart the maximum axis values are much different. © 2012 Adobe Systems Incorporated. All rights reserved. Page 5 Created on 2014-12-16 Tar PM Optimization As expected, the disk throughput is much higher, at over 200MB/sec. The disk utilization reported is on the order of 75%. The background load alone requires 5% CPU usage. The additional load of the tar optimization increases the required CPU usage to 10%. The throughput of TAR optimization generally follows the same pattern, where individual files are processed in around 25 seconds. The following chart shows a direct comparison of TAR pm optimization throughput for the no-load and background-load cases, with the TarOptimizationDelay property set to 0: Although the times are comparable, the average TAR optimization time is longer when the parallel load is present by an average of 29s versus 25s. Also, the overall TAR optimization time is slightly longer at just over 10 minutes versus just over 8 minutes. The differences in timing are real and measureable, but not significant. It appears that substantial amounts of parallel disk activity has a small effect on TAR optimization throughput. As a result of this analysis, the conclusion is that it would be reasonable, on this particular test system, to use the TarOptimizationDelay=0 option in circumstances where the throughput of normal TAR optimization is insufficient. In this case a scheduled tar optimization is run on a weekly or bi-weekly basis, where the process can be scheduled and any interaction with normal application load due to high I/O use could be monitored and managed. © 2012 Adobe Systems Incorporated. All rights reserved. Page 6 Created on 2014-12-16
© Copyright 2025