Title: Performance and Tuning on Solaris[TM] 2.6, 7 and 8

Document ID: 21622 Update Date: Wed Jun 16 2004 Technical Areas: Kernel CPU (Central Processing Unit) note: edited copy from sun's website - copyright notice at bottom When a system is running slowly and performance is degrading it is difficult to know what is the cause. Whether the cause is a lack of memory, disk subsystem bottleneck or limited scalability of a particular application, there are ways to find, understand and possibly remove the root cause. This article gives suggestions on where to start. It covers how to approach performance concerns, address some common performance bottlenecks, introducing a number of concepts like Intimate Shared Memory (ISM) and priority paging which are intertwined with performance. The emphasis is on Solaris[TM] 2.6, 7 and 8. It is not a complete treatment of all performance issues, but is intended to be either a place to start, to stimulate your thinking about Solaris system performance, and suggest where to go next.

Contents

1. Approaching Performance Problems 2. PERFORMANCE MONITORING 2.1. Start at the top. 2.2. Know what your systems do in good times. 2.3. Looking for a Performance Bottleneck 3. Some Commonly Asked Questions and some suggestions 3.1. What does 64-bit Sizing and Capacities buy me? 3.2. Free Memory 3.3. Priority paging 3.4. Intimate Shared Memory (ISM) 3.5. Swap Configurations Related to Shared Memory 3.6. Interprocess Communication Parameters (IPC) Appendix A. IPC articles in SunSolve SRDBs and InfoDoc Collections B. Sun Performance Information C. Useful SunWorld Online Articles D. Useful Books

1. Approaching Performance Problems

Performance, perhaps more than any other aspect of computer system behaviour, requires a holistic approach. To identify a cause rooted in a single or multiple components, a structured approach is a must. The practical upshot is that for performance the single most important part of the troubleshooting process is to define the problem you are trying to solve. In practical terms this means defining an operation or test case for which: a) You know how fast it goes now b) You have a requirement for it to go X times faster or it has gone X times faster under different circumstances. Setting the baseline from which to start is the first step. Performance analysis is a top down sport starting by defining what the problem to be solved is with a clear and concise statement. If you want a system to go faster, you still need to define what attribute of that system you aim to improve and what tradeoffs you will and won't accept. Until you can clearly describe the symptoms of the problem/opportunity, identifying the root cause will always be hit and miss. Performance Analysis is much like detective work where we establish the facts of the case through evidence and observation, being very careful not to jump to premature cause which does not fit the facts. Only naming the suspect when the weight of evidence is overwhelming. Be skeptical about all assumptions. What others state as a fact may really be an assumption which may or may not be incorrect. If the assumption is wrong, you may be working with false evidence and will arrive at an incorrect conclusion. Some words of warning. Solaris is in most cases very good at tuning itself for the workload in hand. The later the release, the less tuning that should be require. It has often been found that the root cause of a performance problem is an attempt at performance tuning. Pay attention to the application first and the Operating Environment last. Any changes to the system configuration such as memory size or disk layout means that performance setting should be checked for their current validity. This is also true of an upgrade where carrying parameters on across an upgrade may limit the performance of the new OS.

2. PERFORMANCE MONITORING

2.1. Start at the top. What operation(s) do you see which are symptoms of the performance problem(s)? For example, are particular types of database query, file or network operations slower than you think they should be. How specific can you be about the operation in terms of providing a test case such as an SQL query or 30 lines of C. By defining your problem statement as precisely as possible of statement giving "What is wrong with what". Some examples of good problem statements include: a An SQL query takes 2X longer on VXFS when compared to UFS. b. SVR4 message queue operations take 30% longer on OS revision A. compared to OS revision B. c. login to system A takes 3x longer than system Y. A problem statement should not contain the solution or a possible solution. Most times, getting a clear statement of the problem is more than half way to solving a problem. It is important to take account of the perspective of the user in stating the problem you are trying to solve, this means taking the application perspective. It goes against human nature which try to prove or disprove a possible cause by experimenting, rather than assessing the merit of a cause relative to observed facts. Poor problem statements include: a. mpstat wt column shows a high wait time b. User jobs take too long c. The SE toolkit has red on it The boundary between the correct functioning of a system and its applications and a performance problem is a often a grey area. Entire system hangs and process hangs are functionality are beyond the scope of this article. If you suspect incorrect functioning of the system as opposed to a performance problem, then log a call with your Sun Solution Centre to develop a course of action. The correct functioning of the system is a requirement for a performance system. As part of your proactive maintenance schedule, it is worth checking /var/adm/messages for indications of hardware issues such as disk retries or excessive message generation. It is well worth looking back at the history of the system, if your system has given better performance, draw a time line detailing the changes before poor performance started was first noticed and when it has been seen since.

2.2. Know what your systems do in good times.

It is a good idea to keep some examples of how you system behaves in good times. Perhaps each month, store performance data when the system is running as expected including: a. The *stat family, vmstat, mpstat, iostat, vxstat b. sar c. ps output to show what processes are running (prstat on Solaris 8) In addition, a number of commercial and unsupported products are available for performance monitoring. An free, unsupported alternative is the SE Toolkit, available from the following URL: http://www.sun.com/sun-on-net/performance/se3/ It reports disk activity, cpu usage, TCP and network connections, memory and more. It is easy to install, does not require a reboot, and produces easy to understand graphical displays. One of the issue with many such products is that threshold values are different for different hardware configurations. Values which would be considered excessive and may to bring a 400MHz system to its knees, may be acceptable to a 900MHz system for example. Just because its red, does not mean its bad.

2.3. Looking for a Performance Bottleneck

Once you have defined what the performance problem you are trying to solve is, the next step is to narrow down the area in which the bottleneck occurs. Questions worth asking at this stage include: a. What can the application tell me about what it sees as a bottleneck. Taking Oracle as an example, an Oracle DBA should know what BSTAT/ESTATS are and how to run and interprate them. Again, taking the application perspective, BSTATS/ESTATS may show the bottleneck which is limiting Oracle performance and guide further analysis. b. where are we spending most time, kernel or user land? answer with vmstat, mpstat or sar, ps and prstat. c. Are all resources of a similar type equally busy ? The intent is to find unequal distribution of resources, for example 1 disk may be a bottleneck or 1 cpu may be busier than the others For cpu's look at mpstat. For disks use iostat. d. What process(es) are using the most resources? To see the top processes using CPU and memory resources: ps -eo pid,pcpu,args | sort +1n %cpu ps -eo pid,vsz,args | sort +1n kilobytes of virtual memory /usr/ucb/ps aux | more output is sorted with highest users (processes) of cpu and memory at the top Solaris 8 provides prstat which gives a running commentry of CPU and memory use. The output from prstat -cvm is very useful. We now look at how to use some of the common Solaris commands for initial performance analysis.

2.3.1. Using the vmstat Command

The command vmstat is concise. Here we can see a example of not enough cpu capacity for the executing applications. % vmstat 15 procs memory page disk faults cpu r b w swap free re mf pi po fr de sr m0 m1 m2 m3 in sy cs us sy id 45 0 0 2887216 182104 3 707 449 6 455 0 80 2 6 1 0 1531 5797 983 61 30 9 58 0 0 2831312 46408 5 983 582 56 3211 0 492 0 0 0 0 1413 4797 1027 69 31 0 55 0 0 2830944 56064 2 649 656 3 806 0 121 0 0 0 0 1441 4627 989 69 31 0 57 0 0 2827704 48760 4 818 723 6 800 0 121 0 0 1 0 1606 4316 1160 66 34 0 56 0 0 2824712 47512 6 857 604 56 1736 0 261 0 0 1 0 1584 4939 1086 68 32 0 58 0 0 2813400 47056 7 856 673 33 2374 0 355 0 0 0 0 1676 5112 1114 70 30 0 60 1 0 2816712 49464 7 861 720 6 731 0 110 7 0 3 0 2329 6131 1067 64 36 0 58 0 0 2817552 48392 4 585 521 0 996 0 146 0 0 0 0 1357 6724 1059 71 29 0 Always ignore the first line of vmstat output. The column labelled "r" under the procs section is the run queue of processes waiting to get on the cpus. The "id" column is cpu idle time. This machine lacks the cpu resources to keep up with the process demand as seen by spending the majority of cpu time in user space (see us column). Two approaches can be taken here, add extra cpus or profile the application code to determine if the part of the application can be optimised. A lot of effort can be expended profiling sections of code for little gain, so be realistic.

2.3.2. Using the mpstat command

mpstat reports per-processor statistics with each row of the table representing the activity of one processor. $ mpstat 5 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 20 0 3592 3350 2338 1355 43 184 285 0 4578 9 6 1 84 1 19 0 304 465 283 2139 135 398 140 0 6170 9 6 1 85 2 25 0 352 507 295 2153 158 433 183 0 7508 12 7 1 81 3 26 0 357 513 302 2082 155 425 181 0 7460 12 7 0 81 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 3 0 3879 3773 2754 1832 61 322 339 0 3424 12 7 0 81 1 2 0 555 544 264 3040 197 670 112 0 4828 15 6 0 78 2 11 0 188 595 269 3141 219 738 121 0 5291 18 6 1 75 3 65 0 185 585 279 2660 211 673 110 0 5420 22 9 0 69 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 6 0 4028 3633 2620 1695 51 287 343 0 2857 12 8 0 80 1 7 0 150 545 265 3044 196 663 117 0 4374 14 4 0 81 2 14 0 226 602 279 2823 225 707 103 0 4715 22 4 1 73 3 2 0 125 600 282 2810 230 699 118 0 4665 18 4 0 78 mpstat identifies what each CPU is spending its time doing. For example, the distribution of system/user/wait/idle time, system calls made, lock contention, interrupts, faults and cross calls. See the mpstat(1M) man page for details of each column.

2.3.3. Using the iostat command

iostat reports disk usage. Each row of the table represents the activity of one disk. Useful options include: -n identify disks according to cXtYdZ -x report extended statistics The -z option (new in Solaris 8) omits lines were no disk activity has taken place in the sampling interval. This helps shorten the output and highlight active disks. -p & -P report per partition I/O statistics which are useful when looking at swap devices. The -E option is useful for identifying disks which are generating errors. iostat also reports activity over NFS which can make output rather long.

2.3.4. The day cometh, truss is your friend

truss(1M) is a utility which executes a specified command and produces a trace of the system calls it performs, the signals it receives, and the machine faults it incurs. truss can also follow the execution of an existing process. truss is a very useful to tool to narrow down what resources an application is requesting from the kernel that are slow or are used to excess. If you don't know about truss, then read the man page and have a play. The -m option is very useful for showing faults such as page faults. The -c option gives a summary of system calls, faults, and signals and the cumulative times spent in each system call type and number of failed system calls.

2.3.5. Lockstat - contention for resources

Kernel locks protect data structures multiple updates and controls access to resources such as disk, network and various kernel caches. lockstat, which appeared in 2.6, executes a command and reports all kernel lock activity for the duration of the command, irrespective of the process or device which made the request for a lock. In Solaris 8 it allows the kernel to be profiled, similar to the undocumented netstat -k option available in previous releases of Solaris. See the lockstat(1M)man page. the option -s 10 reports the stack of the kernel threads contenting on each lock.

2.3.6. Trapstat

trapstat is a tool to provide run-time trap statistics on UltraSPARC(R) n running an otherwise stock Solaris kernel. For I-TLB and D-TLB misses, trapstat can optionally display the amount of time spent in the operating system's TLB miss handler. For interrupt vector traps, trapstat can optionally display the interrupting device. Trapstat will be part of Solaris 9 and a binary is available for use on previous releases. 2.3.8. proc tools The proc tools are utilities that exercise features of /proc reporting attributes of a process such as; pstack, the call stack; ptree, a tree of process relationships; pfiles, a list of open file descriptors; and pldd, a list of dynamic libraries in use by the running process. See man proc(1). 3. Some Commonly Asked Questions and some suggestions 3.1. What does 64-bit Sizing and Capacities buy me? From a performance point of view, the ability to run 64-bit applications has two main benefits. One is that much larger problems can be solved efficiently using a bigger process address space; the other is that integer arithmetic computations get to use 64-bit registers and operations. Overall, programs get slightly larger due to larger pointer values in code and data structures. This in turn means that CPU caches are a little less likely to have enough cache lines, and a slight slowdown might occur in programs that could run just as well in a 32-bit environment. Kernel thread stacks are 16k, rather than 8k, though the effect is usually negligible. 3.2. Free Memory Examining a Solaris system to determine the amount of memory free has traditionally be an area of confusion. For Solaris[TM] releases before 8, to look for a shortage of memory, do not rely upon the "free" column or the sr column. The value in the fr column is not an indication of a lack of memory. The page cache holding on to pages in case they may be needed again. The VM subsystem will only reclaim memory when needed. Much has been written on this subject in the SunWorld articles and "Sun Performance and Tuning". To determine if there is a lack of memory examine at the 12th column,5sr", or scan rate and in conjunction with I/O traffic to the swap partitions(with iostat -P) on disk. The sr column may have high figures if a large amount of I/O is being generated through the filesystem and the page scanner needs to run to free up pages for I/O. The pageout scanner runs only when the free list shrinks below a threshold (lotsfree in pages). Any process or file inactive and not locked in memory may be paged out. The size of the freelist will appear to shrink, and will remain at that value(lotsfree). The page daemon will start to scan for memory to be reclaim from the page cache, exited and idle processes when the amount on the freelist drops below the lotsfree threshold. There is no way for the "free" value to grow much above the threshold, because there is no way to get the page scanner to work to reclaim memory beyond the threshold. It is more efficient for pages to be left in the page cache, than needlessly put on the free list. Solaris 8 cleans up much confusion in this area and implements a more efficient algorithm within the segmap driver to provide the pages required for I/O. The fr column in vmstat really does now mean memory which is free and not used by the page cache. The -p option has been added to vmstat to give a more accurate breakdown of paging behaviour. For individual processes, the pmap command reports the address space layout of an individual process(-x option is useful). 3.3. Priority paging Priority paging was introduced with Solaris 7 and was backported to Solaris 2.6 (kernel patch 105181-09) and Solaris 2.5.1 (kernel patch 103640-25). Priority paging provides an improved paging algorithm which can significantly enhance system response when the file system is being used. Priority paging introduces a new additional water mark, cachefree. The paging parameters are now: minfree < desfree < lotsfree < cachefree By default the new behaviour is turned off in Solaris 2.5.1, 2.6 and 7, so it is important to enable this functionality on systems that are paging noticeably. cachefree is set to lotsfree if priority_paging is not enabled. If it is enabled then cachefree is set to 2 times lotsfree by default. Adjusting this parameter tends to make switching between windows on desktop systems faster, and is a big help for systems running databases that read large files into memory from the filesystem. For system which do a large amount of I/O through a filesystem, speed increases of several hundred percent have been seen for compute-intensive jobs with a large dataset. Solaris 8 uses a different algorithm which removes the limiting factor of previous releases where the page scanner had to scan for memory to supply the segmap driver with memory in which to place I/O. All pages that the segmap no longer uses are put on a list allowing immediate reuse. Do not set priority_paging in Solaris 8. In addition, Solaris 8 should not require tuning of virtual memory parameters, except on large systems where setting fastscan and maxpgio to higher values may be useful. For more information on priority paging refer to the following URLs: http://www.sunworld.com/swol-11-1998/swol-11-perf.html http://www.sun.com/sun-on-net/performance/priority_paging.html INFODOC ID: 17946: has more details on prority paging. 3.4. Intimate Shared Memory (ISM) ISM provides for the shared memory to be locked in memory and cannot be paged out. Memory management data structures that are normally created on a per process basis are created once and then shared by every process. In Solaris 2.6 a further optimisation takes place as the kernel tries to find 4-Mbyte contiguous blocks of physical memory that can be used as large pages to map the shared memory. This greatly reduces memory management unit overhead. (p.333 "Performance and Tuning - Java and the Internet") By default, applications such as Oracle, Informix, and Sybase use a special flag to specify that they want intimate shared memory (ISM). Intimate shared memory is an important optimisation that makes more efficient use of the kernel and hardware resources involved in the implementation of virtual memory and provides a means of keeping heavily used shared pages locked in memory. Intimate shared memory is enabled by default and there is no need to edit the /etc/system file to turn on this feature. With a currently patched kernel, turning off ISM can cause system degradation and possibly a hang condition. In addition database configuration files, such as Oracle's init.ora file, should not have "use_ism=false" which turns it off. 3.5. Swap Configurations Related to Shared Memory To understand swap configurations related to shared memory see Sunworldonline, "Swap space implementation part 2" by Jim Mauro: The two primary considerations in setting swap space size are: a. Have enough memory to avoid swapping in common operation b. Have enough swap to get a crash dump 3.6. Interprocess Communication Parameters (IPC) The values for the following IPC parameters need to be determined by the Database Administrator (DBA). Sun Solution Centres can not give recommendations for what the actual IPC parameter settings should be. These values are application dependent. It is *very* easy to mis-type the /etc/system setting for IPC parameters which can have a significant performance impact on the application. Make a trawl through /var/adm/messages for a message of the form: genunix: [ID 492708 kern.notice] sorry, variable 'seminfo_semopn' is not defined in the 'semsys' which indicates a typo in the line. Grep for "sorry". Solaris 8 has more reasonable defaults for these values than previous releases. For Solaris releases previous to 2.6 more swap space (as "backing store") is needed for shared memory. Using swap -l, divide the block numbers by 2 to get megabytes. There should be at least 2 times the amount of swap available for allocated shared memory (shmmax). Here is the default and maximum values for shmmax: Default Maximum shmmax 1048576 (Meg) 4294967295 (4GB) 2.5.1, 2.6, 32bit solaris 7 2147483647 (2GB) 2.5 or lower Solaris 2.6 shmmax and shmmin are unsigned ints (32 bit). Solaris 7 "32-bit" shmmax and shmmin are unsigned ints (32 bit). Solaris 7 "64-bit" shmmax and shmmin are unsigned longs (64 bit). In all cases, shmmni and shmseg are signed ints (31 bit). shmmax limits the maximum size of a shared memory segment, which is the largest value which can be requested of shmget(2). The resource it controls is not preallocated. It is allocated on demand. Solaris 7 and 8 64-bit breaks the 4GB barrier. The maximum size is theoretical. The actual settings need to be based on the system resources like memory and database sizes and configurations. The maximum size of the segment itself (shmmax) is an upper limit. Appendix A. IPC articles in SunSolve SRDBs and InfoDoc Collections There are numerous articles that have been written by the Sun Solution Centres on the subject of IPC parameters. They are available on the SunSolve web site (sunsolve.sun.com) after logging in as a contract customer. Here is a partial list: If modifications to the /etc/system file do not seem to have taken effect, read the following: SRDB ID: 12824: sysdef -i does not report IPC parameters set in /etc/system For general information on the IPC parameters: INFODOC ID: 13421: Shared Memory Commands Explained INFODOC ID: 6328: All about Shared Memory Parameters in 2.X INFODOC ID: 2270: Understanding semaphores, seminfo_ semaphore info INFODOC ID: 13523: Semaphores Explained SRDB ID: 12075: How to configure the IPC semaphores and shared memory SRDB ID: 5288: How to determine the IPC semaphore parameter values INFODOC ID: 2273: Kernel tuning parameters for message queues INFODOC ID: 7241: Determine the message queue parameters For debugging problems: SRDB ID: 12174: How to check how much shared memory is used by system INFODOC ID: 13480: How to use the UNDO feature in semaphore operations SRDB ID: 16985: A process using shared memory has terminated, but swap space doesn't seem to get reclaimed. B. Sun Performance Information The Sun Performance Information pages are available using this URL: http://www.sun.com/sun-on-net/performance.html http://www.sun.com/sun-on-net/itworld It has links to articles written by Adrian Cockcroft, including all of his SunWorld Online Q&A Columns, and the SE Toolkit. "Sun Performance and Tuning - Java and the Internet" by Adrian Cockcroft is a useful text where the principle stand the test of time. However, the any numbers should be treated with suspicion when applied to current systems. For more information see "Porting Performance Tools to 64bit Solaris White Paper" (URL http://www.sun.com/sun-on-net/performance.html) A number of Blueprints related to performance and other related aspects of system management such as capacity planning are available at http://www.sun.com/blueprints. C. Useful SunWorld Online Articles Jim Mauro has written numerous good articles on how various components of the Solaris kernel really work. Its a very good place to start if you want to know what goes on "under the hood". http://www.sun.com/sunworldonline D. Useful Books "Solaris Internals, Core Kernel ARchitecture" by Jim Mauro and Richard McDougall is an excellent in-depth guide to the inner working of Solaris. "Oracle8 & Unix Performance Tuning" by Ahmed Alomari, 1999 has a wealth of information and covers Solaris 2.6. "Techniques For Optimizing Applications: High Performance Computing" by Rajat P. Garg and Ilya Sharapov, 2001. This book is a practical guide to performance optimization of computationally intensive programs on Sun UltraSPARC[tm] platforms and can be useful in gaining a understand how applications utilize system resources. Copyright 1994-2004 Sun Microsystems