Pages

Categories

Archives

[Google]

Sponsored Links

file system full

Resolving Disk Space Utilization issues; A filesystem is reported as having excessive utilization and is in danger of becoming full. For many customer accounts there are Production Documents containing information about known issues that may be causing the situation and specific instructions as to how to resolve them. Those documents should be the primary and authoritative source referred to in resolving these issues. This document is written as a generic troubleshooting guide and does not concern itself with those customer specific information and instructions.

Examples given are based on RHEL, however the concepts are portable to any *nix OS. Note about Reserved Space; Ext (and other) Filesystems are created by default with 5% capacity reserved for the root user. This is to ensure that the system can continue to function administratively despite excessive space utilization by non-root processes and users. This allows the OS to continue to operate and even to allow mitigation of the situation by automated or manual means. For certain filesystems where the root user must be able to allocate space to write to disk such as /var for instance having a portion of disk space reserved is important.

However the arbitrary 5% may be excessive reserve. This default reflects an earlier time when disks were smaller and fragmentation more likely. Also note that while the space is reserved for uid/gid 0 (“root”) this can be changed. However, this reservation simply isn’t needed or useful on some file systems. This is particularly obvious as it applies to filesystems that are used solely for application data. These are filesystems commonly used for databases, logs and so forth. The root user never writes to these filesystems and as such the root reserve is simply not necessary. These are the very filesystems that are usually likely to be over utilized. In some cases, immediate relief from a ‘file system full’ error can be achieved using tune2fs to reduce the root reserve. This can allow for the system to avoid or recover from an immediate outage and provide time (and space) to undertake a more fundamental repair of the underlying cause of excessive utilization.

Once RTS (return to service) is accomplished the reserved space should be reset to its original values. There are a couple common scenarios that can account for free disk space exhaustion.

  1. Sudden unexpected growth
  2. Unexpected & Unplanned Database growth
  3. Verbose log files

It is important to consider the impact of your investigation on the operational capability and availability of the system. Use of du is a very costly process requiring costly disk I/O for each file or directory evaluated as it traverses through the file system. Depending on the composition of the filesystem this could exhaust device I/O capacity even on a previously idle system. This also flushes application data from the disk buffers which is inadvisable. As such, du should only be run once and the data captured for analysis. (see below).

In the case of sudden unexpected growth then one might infer that the device I/O is not idle in fact. Adding a big du operation will clearly cause contention for resources that will negatively impact the system’s production performance. It is always wise to review the current and historical load on the device using tools like top, sar and iostat. A properly tuned system should have virtually no disk wait. Disk wait in excess of 3-5% should be considered potentially dangerous. This indicates a system that is already under stress, adding the load of a du operation will increase contention for the device and associated caches. As a result the disk wait will rise non linearly, negatively impacting system performance and availability. Using top you can get a quick view of the current instantaneous disk wait. Using iostat you can get a current periodic summary and by reviewing data collected by sar you can get historical periodic summary information that can help you determine what the best Time Of Day (TOD) to perform a du based investigation. Discussion of the use of these tools is beyond the scope of this document however. If the system is experiencing disk wait, you will want to determine the actual device that is under stress. Use iostat to view I/O by device. If the device that is under stress is on the same I/O path as the device with excessive disk space utilization you should continue with extreme caution or you could cause a partial or even complete outage. In such a situation the use of ionice if available might be seen as useful.

However if there is already disk wait, any allocation of device I/O to the du process will cause further contention. So, the use of ionice for idle or best –effort will both increase stress and cause the du to run for a very very long time and as such is inadvisable. In such a situation if the system is a part of a cluster or load balanced pool it may be advisable to remove the system from active service during RTS. This will eliminate the production load and impact to the customer’s service. Fortunately there are alternatives to costly disk operations. Given that the filesystem growth in this case is sudden and significant enough to cause free space exhaustion it is highly likely that the offending file or files are currently open and being written to. In such a case the kernel is aware of the space allocation by file and this can be reviewed without any disk I/O or pushing data out of cache/buffers using lsof. This will also reveal un-linked (deleted) files that are still open by a process and consuming space. This command will provide a listing of all un-linked (deleted) but still open files sorted with the largest at the top:

lsof /|grep -i del|sort -rnk7

Some files Require an archival/rotation plan to keep a system healthy:

  • Slow Growth Over Time
  • Mails to local accounts like root Requires proper mail forwarding/transport
  • Old un-rotated log files
  • many rotated log files

In this situation there may be a large number of relatively small or moderate sized files that are being rotated or created periodically. As such the bulk of these files will not be open for write and the lsof method will not locate these. These files may accumulate over time unnoticed and without a plan. If there is very little disk I/O we may free to make more use of the device, however there are constraints still. Considering that in this case the overhead of the du operation will be very expensive if the cause is due to many files which du must fstat. Even an idle file system can become exhausted by an extensive du operation. The previous recommendation to take the system out of service may be well advised.

The below command gathers data on individual files and summaries for directories and sub-directories only on the device mounted on the root. You can change the argument passed to du to the mount point for the device under scrutiny, like /var. We then sort this output with the largest on top. Note that we use tee to store a copy of the entire output in /tmp/du.out. This allows us to review the data without repeatedly executing the costly du command. If you rerun the du command you will persist this data in the disk buffers to the exclusion of application data which will negatively impact the customer’s performance.

du -ax /|sort -rn|tee /tmp/du.out|less

A Note about Copy/Truncate; copy/truncate places data at risk, even with files that are only appended to, like log files. There is a period of time between when the cp command terminates and the truncation occurs. Data written to the open file can be lost in such a case. (Note that logrotate makes mention of this danger in it’s manual page for it’s ‘copytruncate’ method).

It is very important not to use the ‘;’ shell punctuation. This is highly inadvisable. In order to ensure the shortest possible time passes between the completion of the copy command and the truncation it is necessary to ‘stack’ both commands together, however instead of the unconditional ‘;’ punctuation, the ‘&&’ should be used to ensure the truncation *only* occurs when and if the copy operation was successful. Using the ‘;’ places the data at grave risk.

cp bigfile.log var/temp/bigfile.log.`date +%Y%m%d` && echo > bigfile.log || echo ‘ERROR: File Copy Failed!!!’

cp: cannot create var/temp/bigfile.log.20130806: No such file or directory ERROR: File Copy Failed!!!

cp bigfile.log /var/tmp/bigfile.log.`date +%Y%m%d` && echo > bigfile.log || echo ‘ERROR: File Copy Failed!!!’

ls -l /var/tmp/bigfile.log.*

-rw-r–r– 1 z595847 contractor 426192896 Aug 6 09:53 /var/tmp/bigfile.log.20130806

Here is an example of a real world situation in which copy & truncate caused data to be both added and lost; This is the command that was used to copy and truncate the log file;

[root@www3]~# gzip -9cv mail.log > /local/mail.log.20131117XXXX.gz ; echo ” ” > mail.log

Here is the result of the above;

[root@www3]~# head -3 /var/log/mail.log

v 17 10:26:37 www3 sm-msp-queue[1498]: rAEJAOal006641: to=postmaster, delay=2+19:57:48, xdelay=00:00:00, mailer=relay, pri=23217457, relay=[127.0.0.1], dsn=4.0.0, stat=Deferred: Connection refused by [127.0.0.1]

Nov 17 10:26:37 www3 sm-msp-queue[1498]: rAEJAOag006641: to=postmaster, delay=2+19:57:48, xdelay=00:00:00, mailer=relay, pri=23217457, relay=[127.0.0.1], dsn=4.0.0, stat=Deferred: Connection refused by [127.0.0.1]

There are two things to notice here.

  • First is the empty line caused by the ‘echo ” “‘. This added data to the log file and should not have been done.
  • Second we can see clearly that the first entry written to the log file by syslogd is truncated and missing the first part of the line (“No[v]”)

The log file in this particular situation was being written to approximately twice a second. this illustrates that there does not have to be an extreme amount of writes to the file to cause data loss by the copy & truncate method. Note also that the ‘;’ syntax was used to stack commands unconditionally. If the gizp or redirection had failed for *any* reason, the log file data would have been irretrievably lost. In this case what should have been done would have been to rename the logfile then restart the syslog daemon. This would have clearly rotated the file with zero data loss or insertion.

[root@www3]~# mv /var/log/mail.log /var/log/mail.log.20131117XXXX && service syslog restart && gzip /var/log/mail.log.20131117XXXX

In this example syslog would continue to write to the renamed file until restarted at which point it would create a new mail.log file. While restarting syslog data would buffer, so none would be lost. Commands are conditional to ensure that we do not attempt to compress a log file that is being written to. No data will be lost or added using this safe method.

Comments are closed.