For the performance data for 8th December at 00:05 hours I could see that the fault rate was too high. Sometimes users are facing issues with system response. We have around 300 GB of DLO.s in the system and there are certain jobs which browses through these DLO.s. These would be appearing as QZRCSRVS jobs (NITIBVP).
I am still investigating an issue that happened on 27th October 2009. Business was seriously impacted because of slow system response between 14:00 hours to 18:00 hours. Request your help to find out the root cause for the same.
As best as I can tell from just the PDF data, there was an interactive job significantly impacting response time during this period. While there was still plenty of CPU available (as shown on page 1), page 2 shows an increase in disk activity that aligns with this time frame. Page 3 shows a significant amount of non-database memory faulting. Page 4 shows the huge jump in interactive response time. The top/right is saying that the peak average interactive response time for the day was 12.48 seconds. This is really bad. Unfortunately, this PDF focuses on what jobs impacted the CPU the most. This was not a CPU issue. Go into the System Navigator from the EZRAD WORKPERF menu. Find this period of time and do "5=Work with" and "8=Details" to get more information about what happened and what impacted the system the most at this point in time.
From the system navigator I could see that the job to end inactive sessions listed against the bad response time. Could you please let me know what kind of jobs would be doing non-database memory faulting? Are these the jobs which generates too many spooled files?
Jobs like ENDACTJOB can be responsible for all of this non-database memory faulting. Database faulting is from high database I/O. Non-database faulting is from everything else - jobs starting and stopping, programs loading in and out of memory, program logic executing, etc.. These end inactive session jobs are responsible for 27% of all jobs on your system and consumes 7% of your CPU. This is a bit ironic. You are submitting 6,000 jobs per day to end active jobs - presumably to limit the number of active jobs left on the system. Don't run this process and you would likely have a lot fewer jobs running on the system. There are other ways to do this though. There are capabilities in the operating system to set session timeout values. This process could be one ENDACTJOB job that starts at IPL time, runs all week long ending expired jobs and going into a DLYW status. This unnecessary job initiation and termination has a significant impact on the system. Possibly generating joblogs each time - impacting the system even more. This is not a big change - but with huge benefits. Either use the IBM timeout values or create/modify a fairly simple CL program.