Tuesday, October 30, 2007

redhat hang

If your RedHat system gets really overloaded and hangs here are some things that might help:
  • SysRq: SysRq is a key combo you can hit which the kernel will respond to regardless of whatever else it is doing, unless it is completely locked up.
  • Hangwatch: Hangwatch periodically polls /proc/loadavg, and echos a user-defined set of characters into /proc/sysrq-trigger if a user-defined load threshold is exceeded.
  • NMI: Non-Maskable Interrupt (NMI) is a mechanism to detect system lockups. It enables the built-in kernel deadlock detector. By executing periodic NMI interrupts, the kernel can monitor whether any CPU has locked up and print out debugging messages as needed.

Thursday, October 25, 2007

five-minute monitoring

A system independent of my mail server will text my phone if any of the ports on the mail server are not accessible. It checks every 5 minutes, which is about how long it took to whip this thing up.
me@workstation:~$ crontab -l
*/5 * * * * /bin/sh /home/me/code/shell/monitor.sh > /dev/null 2>&1
me@workstation:~$ cat /home/me/code/shell/monitor.sh
#!/bin/sh

EMAIL="my_number@cell_phone_company.com";
HOST="mailserver.domain.tld";
PORTS="25,80,110,143,443,587,993,995";
CMD=`/usr/bin/nmap -p $PORTS $HOST | grep tcp | grep -v open`;

if [ -n "$CMD" ] # if output of command has non-zero length
then
    echo $CMD | /bin/mail $EMAIL;
else
    echo "$PORTS are open on $HOST";
fi
me@workstation:~$

Wednesday, October 24, 2007

DNS MX hacks

In a previous post I talked about mail gateway load balancing by having two MX records in BIND. I mentioned that I'd want the more powerful server chosen a larger percentage of the time.

You would think I could just add another instance of the same host:

MX 10 mta0
MX 10 mta1
MX 10 mta1
but this doesn't work. BIND just ignores the second entry as redundant. I could make mta2 a CNAME for mta1 and then add mta2 as a third MX record. I've done some tests in a test environment and this works. However, this is a hack and having CNAMES for MX records is not theoretically not permitted (RFC 1034 section 3.6.2).

mail gateway load balancing

I have a dedicated mail gateway (mta0) which filters spam. It's been overworked so I set up a second (mta1). mta0 is the master and stores spam definitions and user preferences in a MySQL DB. mta1 is a slave and receives these content updates from mta0.

In order to get both systems sharing the load I simply add a second MX record for mta1. Here are the relevant portions of my zone file before:

$ grep -n MX domain.tld.zone
16:                     MX      10 mta0.domain.tld.
359:mail                   MX      10 mta0.domain.tld.
$                                                                               
and after:
$ grep -n MX domain.tld.zone
16:                     MX      10 mta0.domain.tld.
17:                     MX      10 mta1.domain.tld.
360:mail                   MX      10 mta0.domain.tld.
361:mail                   MX      10 mta1.domain.tld.
$  
BIND will automatically swap the order of either MX record for a given lookup. E.g. note how 0 or 1 end up on top for alternating queries:
$ dig @dns.domain.tld domain.tld +short MX 
10 mta0.domain.tld.
10 mta1.domain.tld. 
$ dig @dns.domain.tld domain.tld +short MX
10 mta1.domain.tld.
10 mta0.domain.tld.  
$ 
It then just takes a little time for your DNS updates to propagate. You can test your changes by using mxtoolbox.com or sending mail from hosts like gmail and yahoo and seeing which mta relayed by viewing full headers. Before you drop a second email hub into service be sure that it sends mail where it should. It would be a shame if half of your mail was lost. Use the following test and make sure you get the email where you'd expect. You might need to adjust your spam filter to let the test message below through:
telnet mta1 25
HELO workstation.domain.tld
MAIL FROM: me@domain.tld
RCPT TO:me@domain.tld
DATA
test
.
One nice thing about this you can add the second system without any downtime. mta0 does not need to be brought offline; it's just a matter of waiting for DNS to propagate. Since mta1 has twice as much CPU and RAM as mta0 I'm going to look into weighing the records so that mta1 gets more of the load.

Saturday, October 20, 2007

/lib/modules space hack

I made / too small for xubuntu and had trouble upgrading my kernel from 2.6.20-15 to 2.6.20-16.
dpkg: error processing 
/var/cache/apt/archives/linux-restricted-modules-2.6.20-16-generic_2.6.20.5-16.29_i386.deb 
(--unpack):
 failed in buffer_write(fd) (9, ret=-1): backend dpkg-deb during 
`./lib/linux-restricted-modules/2.6.20-16-generic/nvidia_legacy/nv-kernel.o': 
No space left on device
The issue was that /lib/modules/2.6.20-15-generic was taking up too much space (~100M):
$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             236M  214M  9.4M  96% /
varrun                375M  100K  375M   1% /var/run
varlock               375M     0  375M   0% /var/lock
procbususb            375M  100K  375M   1% /proc/bus/usb
udev                  375M  100K  375M   1% /dev
devshm                375M     0  375M   0% /dev/shm
lrm                   375M   33M  342M   9% /lib/modules/2.6.20-15-generic/volatile
/dev/sda7              20G  1.4G   18G   8% /home
/dev/sda5             4.6G  2.3G  2.1G  53% /usr
/dev/sda6             1.9G  800M  982M  45% /var
My klugey fix (this system wasn't too important) was to "rm -r /lib/modules/2.6.20-15-generic" (after I tared it up on another partition). There were complaints about not being able to remove volatile but I was counting on that and left it, everything else made way. I did this while running the 2.6.20-15 kernel; I figured that what I would need for this operation was in RAM. The upgrade worked so I was then able to boot off of the 2.6.20-16 kernel. I then removed the 2.6.20-15 kernel. I guess I only have room for one at a time.
$ sudo dpkg -r linux-image-2.6.20-15-generic
Password:
(Reading database ... 102721 files and directories currently installed.)
Removing linux-image-2.6.20-15-generic ...
Running postrm hook script /sbin/update-grub.
You shouldn't call /sbin/update-grub. Please call /usr/sbin/update-grub instead!

Searching for GRUB installation directory ... found: /boot/grub
Testing for an existing GRUB menu.lst file ... found: /boot/grub/menu.lst
Searching for splash image ... none found, skipping ...
Found kernel: /boot/vmlinuz-2.6.20-16-generic
Found kernel: /boot/memtest86+.bin
Updating /boot/grub/menu.lst ... done

The link /vmlinuz.old is a damaged link
Removing symbolic link vmlinuz.old 
Unless you used the optional flag in lilo, 
 you may need to re-run your boot loader[lilo]
The link /initrd.img.old is a damaged link
Removing symbolic link initrd.img.old 
Unless you used the optional flag in lilo, 
 you may need to re-run your boot loader[lilo]

Thursday, October 18, 2007

load watch

If you want to keep your eye on a system and you don't even have cron you can keep this script running. It wakes up every 10 minutes and sends me an email if the load is more than six:
#!/usr/bin/env python
# Filename:                load_watch.py
# Description:             emails me if load too high
# Supported Langauge(s):   Python 2.5.1
# -------------------------------------------------------- 
import commands
import os
import time
import smtplib

def main():
    while(1):
        frequency = 60 * 10 # ten minutes
        warn_load = 6.0
        loadcmd = 'cut -d " " -f1 /proc/loadavg'
        loadavg = float(commands.getoutput(loadcmd))
        if loadavg > warn_load:
            send_warning(loadavg)
        time.sleep(frequency)
        print loadavg
    
def send_warning(loadavg):
    """emails high load as subject"""
    print "warning"
    domain = 'domain.tld'
    smtpServer = 'mail.' + domain
    fromAddr = 'load_watch@mail.' + domain
    toAddr = 'me@' + domain

    msg = ""
    msg += "To: " + toAddr + "\n"
    msg += "From: " + fromAddr + "\n"
    msg += "Subject: " + 'Mail Load: %s' % loadavg
    msg += "\n\n"

    server = smtplib.SMTP(smtpServer)
    server.set_debuglevel(0)
    server.sendmail(fromAddr, toAddr, msg)

if __name__=="__main__":
   main()

Wednesday, October 17, 2007

ping keep shell

If you don't want an SSH session to timeout you can leave the following command running.
ping -i 30 $host
This command sends a single ping to $host every 30 seconds by using ping's interval option.

Tuesday, October 16, 2007

blackberry spoofing?

I have a colleague who does user support with blackberry devices. We ended up looking at the headers of a message from one of the new devices that he was testing. He was told that the new device uses a different protocol. The first header looked something like this:
Received: from mail.domain.tld (HELO domain.tld) ([123.456.78.9])
  by as16.bis.na.blackberry.com with ESMTP; 11 Oct 2007 20:29:46 +0000
Note that there's no message ID and I have nothing in my logs from this transaction. A normal message sent to google looks like this:
Received: from domain.tld (mail.domain.tld [123.456.78.9])
        by mx.google.com with ESMTP id  i35si14940528wxd.2007.10.16.11.05.19;
        Tue, 16 Oct 2007 11:05:19 -0700 (PDT)
Note the message ID and that I can confirm an SMTP handshake in my logs:
14:05:20.14 2 SMTP-25607(gmail.com) [12046375] sent to [66.249.83.27:25], 
got:250 2.0.0 OK 1192557920 i35si14940528wxd
In the case of the blackberry the first header really came from them. I suspect that the device connected to their server over the cellular network to send the mail. Their server then wrote that header to say it was from us, not them. So this first header in what supposedly happened is misleading as far as I can tell.

Friday, October 12, 2007

DNS Math

; I'm posting this one in Elisp.  
; 
; Someone I work with entered 200710110333 instead of 
; 2007101103 for a serial number field of the SOA RR on 
; our root DNS server.  Once this high number propagated 
; DNS updates with the correct date broke.  
; 
; The problem is that the following is true:

(< 2007101201 200710110333)

; so today's updates didn't propogate to our other servers.  
; 
; According to:  
;  http://www.zytrax.com/books/dns/ch9/serial.html
; 
; "perhaps ritual suicide is the best option" it also says:  
; 
;; The SOA serial number is an unsigned 32-bit field with
;; a maximum value of 2**31, which gives a range of 0 to 
;; 4294967295, but the maximum increment to such a number 
;; is 2**(31 - 1) or 2147483647, incrementing the number 
;; by the maximum would give the same number.  
; 
; Did I read that math right?  I think the key word here 
; is "usigned".  Checking up on this:  

(insert-string (expt 2 32))
(insert-string (expt 2 31))
(insert-string (expt 2 30))

; Inserts 4294967296, 2147483648 and 1073741824 into the 
; buffer respectively.  Also, the notation is bad, I think they 
; mean  (2**31) - 1 not 2**(31 - 1).  More precisely one of 
; the two:

(- (expt 2 31) 1)
(- (expt 2 32) 1)

; I guess I'll compute both and use this to solve my problem.  
; Let's assume that the following is true:
;
;; An unsigned 32-bit field with a maximum value of 2**31
;
; Actually let's check the RFC:
; 
;; http://www.faqs.org/rfcs/rfc1982.html
; 
; which defines SERIAL as:
;
;; The unsigned 32 bit version number of the original copy of
;; the zone.  Zone transfers preserve this value.  This value
;; wraps and should be compared using sequence space arithmetic.
;            
; It is "the maximum is always one less than a power of two."

; The DNS-Pro book then says:
; 
;; Using the maximum increment, the serial number fix is a two-step
;; process. First, add 2147483647 to the erroneous value, for example,
;; 2008022800 + 2147483647 = 4155506447, restart BIND or reload the zone,
;; and make absolutely sure the zone has transferred to all the slave
;; servers. Second, set the SOA serial number for the zone to the correct
;; value and restart BIND or reload the zone again. The zone will
;; transfer to the slave because the serial number has wrapped through
;; zero and is greater that the previous value of 4155506447! RFC 1982
;; contains all the gruesome details of serial number comparison
;; algorithms if you are curious about such things.
; 
; OK so what's the real_error value?  It wrapped by 2^32:

(mod 200710110333 (expt 2 32))

; Which makes sense since
; 
; me@workstation:~> dig @nameserver mta1.domain.tld 
; ...
; domain.tld.          3600    IN      SOA     
; nameserver.domain.tld. hostmaster.domain.tld. 
; 3141614717 1800 900 86400 3600
; me@workstation:~>

; So, I'm going to set my root server's SOA SN to 3141614717 

(mod 200710110333 (expt 2 32))

; To get everyone back in sync.  dig verified that they're 
; back in sync in less than an hour:
;
;; for x in dns1 ... dnsN; 
;;   do dig @$x domain.tld SOA +short; 
;; done
;
; I'm then free to set it to:   

(let ((today 2007101201)) 
  (- (expt 2 32) 
     1 
     (abs (- today (- (expt 2 31) 1)))))

; or greater than 0 but less than:  

(let ((error_value 3141614717)) 
  (mod 
   (+ (- (expt 2 31) 1) error_value)
   (expt 2 32)))

; For fun my colleague and I went with 666 and then did 
; an "rndc reload" and then set it to today and reload 
; again. 

mail arriving out of order?

If your users complain about mail arriving out of order you can tell them the story below. But if they're complaining that this happens a lot, it's probably a symptom of your MTA being overloaded.

The mail protocol makes no guarantee that your messages will arrive in order.

If the mail server for foo.com tries to relay message-0 sent at 2:00 to mta0.domain.tld and mta0 doesn't have the resources it can and will refuse the SMTP connection. It's also possible that foo.com will then put message-0 back in its queue to not be resent for as long as root@foo.com sees fit to configure it. Let's assume this value is 1 hour. foo.com could then try to send message-1 (this not not message-0 which is still in the queue) at 2:05. When it tries to get it's SMTP connection this time mta0 has resources at that moment and accepts the message for delivery. Then at 3:00 foo.com tries to make another connection for message-0 to mta0 which again has resources and accepts the message. So, in the end message-0 sent at 2:00 arrives at 3:00 and message-1 sent at 2:05 arrives at 2:05. All of this is perfectly legal.

host2dig

I'm trying to get used to using dig instead of host. Here's two simple idioms to start with conversion:

Get an IP from a particular DNS server:

 dig @dns-server $host_name +short

Get a hostname from a particular DNS server:

 dig @dns-server -x $ip_address +short

Then read the DiG HOWTO.

Wednesday, October 10, 2007

mail

This is a handy command for testing:
echo -e "sent on `date`" | mail -s "test: `hostname`" foo@domain.tld
It will send mail quickly from the command line and there's no need to muck about with any clients After running this you can go see how your message is doing in /var/spool/mqueue etc.

Tuesday, October 9, 2007

tar -C

Would you believe that I got this far without knowing the tar -C option?

If you do the following:

tar xzf foo.tar.gz -C /
then /foo will contain the contents of foo.tar.gz.

Note that the -C means change to the specified directory so that the contents of your extract end up there. As the man page says:

       -C, --directory DIR
              change to directory DIR
I normally just put the tarball where I want it extracted and then extract it. My problem tonight was that the partition where I wanted to extract it to was not big enough for both the tarball and its files. I had to extract it to that directory from a different partition. Of course I had to wait until I saw a disk full error to realize this. If I had known about it I would probably be home now and not waiting for a 5G tarball to uncompress into 15G. I'll probably remember it now.

Monday, October 8, 2007

tcpdump

If you're debugging network services between two hosts don't forget to let tcpdump help you.

E.g. if host1 can't SSH to host2 and you think an external firewall is blocking you, try this:

1. Have host2 display TCP info on port 22 and display only results containing host1:

host2$ tcpdump -i eth0 tcp port 22 | grep host1
2. Try to SSH from host1 to host2.

If an external firewall is blocking them you should see nothing from the command above. However, if you see something like the following, then the external firewall isn't blocking you and it's an issue between the two hosts:

host2$ tcpdump -i eth0 tcp port 22 | grep host1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
14:08:50.412007 IP host1.domain.tld.54403 >
host2.domain.tld.ssh: S 2707949880:2707949880(0) win 5840 
14:08:53.411702 IP host1.domain.tld.54403 >
host2.domain.tld.ssh: S 2707949880:2707949880(0) win 5840 
14:08:59.411422 IP host1.domain.tld.54403 >
host2.domain.tld.ssh: S 2707949880:2707949880(0) win 5840 

In the case above host2 is unable to get back to host1 to complete the TCP hand shake. In this case you could try reaching host1 from host2 and debug the resulting issue (e.g. broken netmask on host2).

A healthy looking tcpdump from host2 would look like this:

host2$ tcpdump -i eth0 tcp port 22 | grep host1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
14:11:12.101381 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: S 1120004832:1120004832(0) win 5840 
14:11:12.101391 IP host2.domain.tld.ssh >
host1.domain.tld.54432: S 320695995:320695995(0) ack 1120004833 win
5792 
14:11:12.101498 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: . ack 1 win 183 
14:11:12.107969 IP host2.domain.tld.ssh >
host1.domain.tld.54432: P 1:24(23) ack 1 win 1448 
14:11:12.108063 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: . ack 24 win 183 
14:11:12.108165 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: P 1:23(22) ack 24 win 183 
14:11:12.108173 IP host2.domain.tld.ssh >
host1.domain.tld.54432: . ack 23 win 1448 
14:11:12.108341 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: P 23:663(640) ack 24 win 183

14:11:12.108347 IP host2.domain.tld.ssh >
host1.domain.tld.54432: . ack 663 win 1768 
14:11:12.109143 IP host2.domain.tld.ssh >
host1.domain.tld.54432: P 24:664(640) ack 663 win 1768

14:11:12.109313 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: P 663:687(24) ack 664 win 223

14:11:12.111342 IP host2.domain.tld.ssh >
host1.domain.tld.54432: P 664:816(152) ack 687 win 1768

14:11:12.113385 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: P 687:831(144) ack 816 win 223

14:11:12.119580 IP host2.domain.tld.ssh >
host1.domain.tld.54432: P 816:1280(464) ack 831 win 1768

14:11:12.121828 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: P 831:847(16) ack 1280 win 263

14:11:12.162117 IP host2.domain.tld.ssh >
host1.domain.tld.54432: . ack 847 win 1768 
14:11:12.162204 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: P 847:895(48) ack 1280 win 263

14:11:12.162219 IP host2.domain.tld.ssh >
host1.domain.tld.54432: . ack 895 win 1768 
14:11:12.162262 IP host2.domain.tld.ssh >
host1.domain.tld.54432: P 1280:1328(48) ack 895 win 1768

14:11:12.163845 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: P 895:959(64) ack 1328 win 263

14:11:12.164001 IP host2.domain.tld.ssh >
host1.domain.tld.54432: P 1328:1392(64) ack 959 win 1768

14:11:12.166947 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: P 959:1055(96) ack 1392 win 263

14:11:12.168083 IP host2.domain.tld.ssh >
host1.domain.tld.54432: P 1392:1456(64) ack 1055 win 1768

14:11:12.170456 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: P 1055:1151(96) ack 1456 win 263

14:11:12.170493 IP host2.domain.tld.ssh >
host1.domain.tld.54432: P 1456:1520(64) ack 1151 win 1768

14:11:12.170635 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: P 1151:1519(368) ack 1520 win 263

14:11:12.171024 IP host2.domain.tld.ssh >
host1.domain.tld.54432: P 1520:1840(320) ack 1519 win 2088

14:11:12.180209 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: P 1519:2159(640) ack 1840 win 263

14:11:12.183403 IP host2.domain.tld.ssh >
host1.domain.tld.54432: P 1840:1872(32) ack 2159 win 2408

-- 14:11:12.183659 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: P 2159:2223(64) ack 1872 win 263

14:11:12.187702 IP host2.domain.tld.ssh >
host1.domain.tld.54432: P 1872:1920(48) ack 2223 win 2408

14:11:12.187923 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: P 2223:2671(448) ack 1920 win 263

14:11:12.201571 IP host2.domain.tld.ssh >
host1.domain.tld.54432: P 1920:1968(48) ack 2671 win 2728

14:11:12.201596 IP host2.domain.tld.ssh >
host1.domain.tld.54432: P 1968:2080(112) ack 2671 win 2728

14:11:12.201691 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: . ack 2080 win 263 
14:11:12.229982 IP host2.domain.tld.ssh >
host1.domain.tld.54432: P 2080:2144(64) ack 2671 win 2728

14:11:12.239626 IP host2.domain.tld.ssh >
host1.domain.tld.54432: P 2144:2208(64) ack 2671 win 2728

14:11:12.239719 IP host1.domain.tld.54432 >
host2.domain.tld.ssh: . ack 2208 win 263 

If you read the above you can see host1 ack'ing 1120004833 from host2 to establish a connection. There are plenty of tcpdump examples on the Interblag.

Wednesday, October 3, 2007

nice

I experimented with nice values and top and I'm pasting my results here.

When running top note the value of PR vs NI:

PR:

The priority number is calculated by the kernel and is used to determine the order in which processes are schedule. The kernel takes many factors in to consideration when calculating this number, and it is not unusual to see large fluctuations in this number over the lifetime of a process.

NI:

This column reflects the "nice" setting of each process. A process's nice is inhereted from its parent. Most user processes run at a nice of 0, indicating normal priority. Users have the option of starting a process with a positive nice value to allow the system to reduce the priority given to that process. This is normally done for long-running cpu-bound jobs to keep them from interfering with interactive processes. The Unix command "nice" controls setting this value. Only root can set a nice value lower than the current value. Nice values can be negative. On most systems they range from -20 to 20. The nice value influences the priority value calculated by the Unix scheduler.

To see these values in action I'll start two CPU intensive processes:

someguy@machine:~$ dd if=/dev/urandom of=/dev/null & 
[1] 31406
someguy@machine:~$ dd if=/dev/urandom of=/dev/null & 
[2] 31415
someguy@machine:~$ 

I see them both at the top of the stack:

  
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
31406 someguy   25   0  9200  772  620 R  100  0.0   3:48.62 dd                 
31415 someguy   25   0  9204  772  620 R  100  0.0   3:28.74 dd                 

Note their priority of 25 and their nice value, let's see how they change as we renice them.

Remember, the nicer a program the lower it's priority. The less nice a program the higher it's priority. For exmple:

  • Nice value of 19 is letting people walk on you
  • Nice value of 5 is holding the door
  • Nice value of -20 is pushing people out of the way

You can nice a program when you start it:

   nice -n -10 xmms 
If it's already running:
   renice 19 -p $pid
In this case I'll move one way up and one way down:

lower 406: renice 19 -p 31406
raise 415: renice -19 -p 31415

The results of the first:

someguy@machine:~$ renice 19 -p 31406
31406: old priority 0, new priority 19
someguy@machine:~$ 

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
31415 someguy   25   0  9204  772  620 R  100  0.0   6:50.54 dd                 
31406 someguy   39  19  9200  772  620 R  100  0.0   7:10.48 dd              

So, it's priority went higher and the higher nice value shows. So, even the PR numbers are inversely proportional to nice numbers. Now, I raise the other:

someguy@machine:~$ sudo renice -19 -p 31415
Password:
31415: old priority 0, new priority -19
someguy@machine:~$ 

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
31406 someguy   39  19  9200  772  620 R  100  0.0   8:46.89 dd                 
31415 someguy    6 -19  9204  772  620 R  100  0.0   8:26.92 dd               
So, I've raised the priority of the first. I could maximize it:
someguy@machine:~$ sudo renice -20 -p 31415
31415: old priority -19, new priority -20
someguy@machine:~$ 

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
31406 someguy   39  19  9200  772  620 R  100  0.0  17:00.51 dd                 
31415 someguy    5 -20  9204  772  620 R  100  0.0  16:41.13 dd              

Note that I still can't get it past 5 with the lowest possible nice value. Note also that these are the two most CPU intensive processes so they both take the top of the stack, even with the most extreme nice values. If I had a third it should be sandwiched between the two.

someguy@machine:~$ dd if=/dev/urandom of=/dev/null & 
[1] 32089
someguy@machine:~$ 

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
32089 someguy   25   0  9200  768  620 R  100  0.0   0:21.44 dd                 
31415 someguy    5 -20  9204  772  620 R  100  0.0  18:29.59 dd                 
31406 someguy   39  19  9200  772  620 R  100  0.0  18:48.88 dd                

Now it's scheduling them round robin.

15, 6, 89 :: 15, 89, 6 :: 6, 89, 15 :: 89, 6, 15 :: 6, 15, 89

But 15 is on top 2/3 of the time. Interesting. It still letting the other jobs at the resources.

Now, I'll start a fourth CPU hog with a nice value from the start:

someguy@machine:~$ sudo nice -n -20 dd if=/dev/urandom of=/dev/null &
[2] 32288
someguy@machine:~$ 

I predict that 88 and 15 will stay on top most of the time. Note that it was started with a higher nice value which required sudo so it's running as root:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
31415 someguy    5 -20  9204  772  620 R  100  0.0  30:48.50 dd                 
32288 root       5 -20  9200  772  620 R  100  0.0   4:59.89 dd                 
31406 someguy   39  19  9200  772  620 R   99  0.0  31:05.53 dd                 
32089 someguy   25   0  9200  768  620 R   99  0.0  12:39.09 dd                

Now I'll kill 88 and 15. Then let's try raising the importance of 6 and introducting a new process with a priority just below it:

renice -20 -p 31406 && nice -n -19 dd if=/dev/urandom of=/dev/null &


someguy@machine:~$ sudo su -
root@machine:~# renice -20 -p 31406 && nice -n -19 dd if=/dev/urandom of=/dev/null &
[1] 32656
root@machine:~# 31406: old priority 19, new priority -20

root@machine:~# 

top shows:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
31406 someguy    5 -20  9200  772  620 R  100  0.0  39:28.19 dd                 
32089 someguy   25   0  9200  768  620 R  100  0.0  21:02.10 dd                 
32658 root       6 -19  9200  768  620 R  100  0.0   0:22.50 dd              

Wasn't that fun?

Remember when running top that you can see how all of the CPUs are working by pressing 1.

printing

No, I don't mean having your program print something to the screen:
printf("Hello, world...\n");
I mean sending a PS file (if you're lucky) to a mechanical device which puts its content on paper. What good is that? You can't grep paper. You can't pipe programs to or from it or even copy and paste with it. I guess it works when the power is out, but you can't read it in the dark. A wise man once told me "the only time I've had to print something is when I had to hand it to an idiot". Since this post is about doing something that's not cool I'll describe a non-cool way to approach the issue since it relates to a non-cool part of my job and I'm just going to log it here. Sorry it's lame. I hope the anecdote made this post worth reading on some level.

Anyway, If you're configuring a RedHat box that wasn't minimally installed you can add printers remotely by doing:

ssh -X server
Once you're in you can do:
sudo /usr/sbin/system-config-printer-gui

<sarcasm>love it!</sarcasm> Note that the GUI requires you to have something present for the printer path. This seems like a bug since there are times when I want to print to a network printer like an LaserJet 4000 and this field simply doesn't apply. If this happens you can create the printer with the GUI and then you need to go in and edit the file that it generates (which is /etc/cups/printers.conf) and delete that path.