tinsel.org

Topics in Unix Seminar – What is Unix?

by Thomas Insel

The goal of this talk is threefold:

and the overall purpose is to explain how to effectively use a Unix system. We will start with an overview of some of the basic ideas the system is designed around. After laying the foundation, we’ll briefly cover a number of easy topics and touch on more advanced topics, so we can learn what’s possible.

Where I lie or gloss to simplify stuff, I’ll try to make a note.

Contents

I. History

Unix was a simple multiuser operating system written in 1969 at AT&T Bell Labs by Ken Thompson and Dennis Ritchie (the creator of C). The name is a play on the very complex operating system Multics on which Thompson and Ritchie had worked. In contrast, Unix was a very simple system, written to run Space Travel on a spare PDP-7. Later, it was rewritten in C and used as the basis for an in-house typesetting system. Oh, and Brian Kernighan was involved too, but I can never really remember who did what.

Unix was developed at AT&T for several years before Version 7 was widely distributed to universities. Berkeley improved it with stuff like virtual memory and networking, and added many of the common commands, creating BSD Unix. SunOS is derived from BSD. AT&T tried to standardize Unix, creating several versions culminating in System V. Nowadays, almost everything (including Solaris) is based on System V, but incorporates lots of Berkeley stuff. The Open Group owns the Unix trademark.

Except for VMS, Unix is basically the only non-IBM operating system still used on large computers. There are a lot of reasons for this, but mostly because Unix was inexpensive and easy to port to new hardware during an important time in computer history.

II. Concepts

Unix is based on a kernel, essentially a small program that knows how to talk to the computer’s hardware, allocates resources such as processor time to running programs, and enforces security (so that I can’t read your files). The opposite of kernel is user. User programs are known as processes and can make system calls to ask the kernel to do things such as input/output. When using Unix, you’re typically typing at a shell, a program designed to help you manipulate files and run programs. Other special user programs include init which is responsible for starting up other user programs and various programs known as daemons, which run in the background and perform various system-related or housekeeping duties. Processes are organized in a tree: when one process runs another, it is called the parent.

Processes and files each have an owner and an owning group, identified by numbers (the uid and gid). When you log into a Unix computer, you tell it your username and password, but it’s all just numbers internally. The file /etc/passwd stores the mapping between usernames and uids, as well as storing passwords, and /etc/groups stores the mapping between groups and gids. Each user has only one uid, and a primary gid, but may belong to any number of secondary groups. These numbers are used for accounting and security, as we’ll see when we talk about file permissions [1].

All of the files and directories form a tree, even if they’re physically stored on different disks. In a networked environment, some or all of the directories may be shared between multiple computers, but they don’t appear any different to users.

Each directory contains a special directory . which refers to itself, and .. which refers to its parent. Note that directories are really just a special type of file, that stores a list of children [2]. You can refer to files by absolute paths (starting with /, the root of the tree) or relative paths with using ..’s and such. By the way, filenames in Unix are case-sensitive and can include almost any character except slashes (although spaces and control characters get annoying, and there are problems with special characters that the shell interprets).

Each user has a home directory, which they “own.” The system looks here for some configuration files, and it’s a place to store files. When you log into the computer, or start a new shell, you usually start off located in your home directory.

Each file has an owner, a group, and an associated set of permission bits. Only the file’s owner can change these. In each of the categories user, group, and other, you can control read, write, and execute permission. These control the access given yourself, other users in your group, and other users not in your group. For directories, read permission allows you to list files, write permission allows you to create and delete files (even if you don’t own them), and execute permission allows you to access files provided you know their names [3].

Soft links are special files that point to another file by name.

Device files are another type of special file, used to communicate with hardware devices like serial ports. There’s also a device file corresponding to each logged-in user. The shell reads and writes to your device file to learn what you type and print text for you to see [4].

III. The Directory Tree

Here’s a directory tree. This is an idealized view of a standalone machine. The necessities of a networked environment with multiple architectures and operating systems make the math department setup a bit more complicated. Typically, /usr is shared between multiple machines and is read only, but each machine has its own root directory.

PathnameContents
/“The root directory.”
/binBasic commands.
/devDevice files.
/etcBasic configuration files and system programs.
/homeOne common location for home directories.
/libShared libraries, sometimes compiler support.
/sbinAnother location for system programs.
/tmpSmall temporary files.
/sysFiles for building a new kernel.
/usrLess basic files, typically shared and read-only. Has subdirectories corresponding to those at the root level, such as /usr/bin, /usr/etc/, ….
/usr/binMost commands.
/usr/etcConfiguration files and some daemons.
/usr/gamesWhat it says.
/usr/libShared libraries, support for compiler, and support files for various other programs.
/usr/localLocally added files, also /usr/custom. Typically contains another set of directories, bin, etc, lib, man, and so on.
/usr/manNormal location for man pages, unless they’re in /usr/share/man.
/usr/sbinSystem programs.
/usr/shareFiles shared between different architectures.
/usr/srcNormal location for source code.
/usr/ucbUnder SysV, Berkeley versions of binaries go here.
/varfrequently changed files, spool directories, etc.
/var/mailUsers’ mail files.
/var/tmpAnother place for temp files.

IV. The C Shell and Some Basic Commands

Once you log into a computer, you’re presented with a prompt to type at or perhaps a desktop with several windows you can type into. The program that watches you type and reacts is called a shell. You use it to run other programs. Most shells let you write programs using structures like if/then, for…next, and so on — we’ll talk about this later. Here, we’ll be talking about the C shell, /bin/csh, but everything also works with tcsh. The ideas are the same in Bourne-derived shells sh and bash.

You should know about the files .login and .cshrc. These files contain C shell commands that are run when the shell starts up. They are responsible for setting up your terminal, and lots of shell configuration options. Since we’re going to talk about the default math department setup for now, since this involves shell programming, and since there are some subtle issues involved, we’ll pass over this for now.

Variables

There are two kinds of variables in the C shell: shell variables and environment variables. Shell variables control the configuration of the shell (prompts, and so on), or can be used for storing random stuff. Every process has environment variables, and inherits their settings from its parent (such as a shell), so these can be used for more general configuration. Note that these settings are local, so if you change them in one window, they won’t change other shells you might be running.

Let’s try a quick tutorial: type “set” to see the current settings. Now type “set fred = hello” to put the text “hello” into the variable fred. Type “echo $fred” to see what’s in the variable fred. Echo is a command that just repeats its arguments, and the shell replaces $fred with the contents of the variable fred before running echo. We’ll see more examples of this sort of shell expansion soon.

To set environment variables, use “setenv” instead. For some reason, environment variables always have names IN ALL CAPS. Here are some examples (note the missing =):

setenv
setenv FRED hello
echo $FRED
unsetenv $FRED
printenv

Path

One special environment variable is $PATH. It is a list of locations that the shell looks for programs in, seperated by colons. On a simple computer it might look like:

/usr/bin:/bin:/usr/local/bin:/usr/games

The C shell reads this variable on startup, and caches the list of commands in these directories. Very few commands are actually built into the shell. Most are found because they’re in one of the directories that $PATH lists. One built in command is rehash, which you can use to tell the shell to rebuild this cache.

Basic File Commands

So far, the only commands we’ve used that aren’t built into the shell are setenv and echo. Let’s list some simple, useful commands to manipulate files:

lslist files in the current directory.
pwdprint the current directory.
touchcreate an empty file (updates the access time of an existing file).
Use this to create files to play with.
cpcopy files.
mvmove files.
rmdelete files.
mkdircreate a new directory.
cdchange working directory.
rmdirdelete a directory.
chmodchange the permissions of a file (see also chgrp and chown).

To see specifics on how to use these programs, you can use the man command. For example, “man ls.” Most Unix commands share a simple syntax: “command flags arguments.” Where the flags are special arguments in the form of a dash and a character, or list of characters. For example, “ls -l” lists the files in the current directory in a long format that tells who owns the files, what their permissions are, how big they are, and when they were last changed. “ls -l /bin” prints a long listing of all of the files in the /bin directory, and so on.

Some Special Characters

We’ve already seen that the shell replaces $fred with the contents of the variable fred and $FRED with the contents of the environment variable fred. There are many other replacements made. For example, the shell replaces a tilde (~) with your home directory and ~tinsel with my home directory. A very important idea is wildcard expansion. The shell replaces a * with a list of all files in the current directory, ../* with a list of all files in the parent of the current directory, and ~tinsel/* with a list of my files. To see a list of commands that start with r, try “ls /usr/bin/r*” The shell matches a ? with any single character in the same way. To see a list of all files in the current directory that have three character names, try “ls ???[5].

By the way, files that have names that start with a period are special. They don’t show up normally — ls doesn’t list them and * doesn’t expand to match them. To list them use “ls -a”.

The shell has many special characters. We’ve already talked about some, and we’ll be talking about many of the rest when we talk about shell scripts, but a few are commonly used directly. One important one is backslash. Use it to escape out characters that would otherwise mean something to the shell. For example, you can create a file named ? with “touch \?” and a file named \ with “touch \\” (but you probably shouldn’t) [6].

The shell uses three different types of quotes. Quotes ” and ’ basically give alternatives for backslash when using special characters. The difference: "$fred" evaluates to the contents of the variable fred while '$fred' evaluates to the string $fred. The last quote is different. The shell evaluates the quoted command and replaces the quote with the output of the command.

If you want to use more than one command on the command line, seperate them with semicolons “;” and they will run serially. To group them together (for use with redirection or backgrounding, see below), use parentheses.

History

The shell keeps track of previous commands youve typed. You can list its memory with “history”. Some of the ways to use history:

!!Repeat the last command.
!-2Repeat the command before last.
!33Repeat command 33 (that’s why there’s a number in your prompt).
!xRepeat the last command beginning with an x.
^mispell^misspell^Repeat the last command, correcting a spelling mistake

There are more, see the csh man page if you care, but with tcsh and more modern shells, you can use the up and down arrows to scroll through previous commands, and the left and right arrows to edit them.

Redirection

One of the strengths of Unix is the capability to use several simple modular commands together to create a more powerful command. This is done with one of several redirection operators:

Operator Meaning Example
> send output to file ls -l > directory_list
>> append output to file ls -l .. >> directory_list
< get input from file sort < people

When configured normally, the shell prevents you from using > with an existing file. If the appropriate shell variable (noclobber) is not set, it will erase the previous file. Most error messages will still get sent to the console, not to the file [7].

To send output from one command directly to another, we use the “pipe” operator “|”. For example,

spell thesis | sort | uniq > words_I_cant_spell

Aliases and Scripts

One way to create new commands is to create shell scripts (like DOS *.BAT files, but with more programming capabilities. Another way is to create aliases. There’s only so much time, so we’ll put all of this off until a special talk on scripting.

Job Control

You can run more than one program at once. If you have a noninteractive command that will take a while, you can run it in the background with & as so:

long_math_calculations > thesis_results &

If you are running a program and wish to temporarily suspend it, press control-Z. Your program will stop running and you return to the command prompt (some programs like pine override this). To allow your process to run in the background type bg. It will pause if it needs input from the console. To return to it,type fg.

If you do this with several programs at once, you will need to know how to refer to each seperately. Use the jobs command to list your running and suspended programs by number, and then “fg [1]” to bring the first to the foreground, etc. By the way, you can’t background a process in one shell and foreground it in another.

kill        - send a signal to a process, or terminate a process
ps          - display the status of current processes 
top         - display and update information about the top cpu processes

nice        - run a command at low priority
nohup       - run a command immune to hangups

at, batch   - execute a command or script at a specified time
atq         - display the queue of jobs to be run at specified times
atrm        - remove jobs spooled by at or batch
crontab     - install, edit, remove or list a user's crontab file

Control Characters

These can be changed with the stty command. The primary confusion is between backspace (control-H) and delete (DEL).

control-CInterrupt/Kill current program
control-DEnd of file — end most pipes
control-QResume output
control-SPause output
control-UErase to beginning of line
control-WErase previous word
control-ZSuspend current program

Quitting

Like everything else, there are several ways of exiting the shell. The simplest is to type “exit”.

V. More Commands

Getting Help

The standard Unix command to get help is man, which stands for manual. The manual comes in several sections:

1User commands
2System calls
3Library functions
4Device drivers
5File formats
6Games
7Miscellaneous
8System administration

In general, you’ll be interested in the first section. If commands with the same name occur in multiple sections, you can use “man 1 command” to be sure to get the version you want. Man pages are usually brief instructions on how to use a particular command, including a summary of flags and other arguments. Traditionally, there’s a second volume of the manual with tutorials, but it’s not usually online.

Each man page begins with a one line description of the command, many of which are quoted later in this document. If you’re looking for a command, but don’t know its name, you can use the apropos command to search through these lines. For example, “apropos print” will list some commands that involve the printer. It will also list standard C functions like printf. If you want to see the one line description of a command instead of the complete page, use whatis. Unfortunately, not all man pages show up in apropos/whatis.

A nonstandard command available locally is help. This provides access to a database of helpful information, layed out as files and directories. You’ll see a menu of options. Simply type a name to see the file or change directories. Type “..” to go up one directory. You can also type help topic from the command line if you know what subject you want.

These help files are of mixed usefulness. Some are locally written. For example, help sun-where provides a list of installed computers and their locations, and is almost current. Others are general berkeley information, and may be as many as fifteen years old, referring to nonexistent computers, software, and policies.

More File Commands

df      - report free disk space on file systems
du      - display the number of disk blocks used per directory or file
file    - determine the type of a file by examining its contents
find    - find files by name, or by other characteristics (and then do something with them)
ln      - make hard or symbolic links to files
whereis - locate the binary, source, and manual page files for a command
which   - locate a command; display its pathname or alias

Pipes

Actually, most of these (and other) programs can be used as a pipe: “cat /etc/motd | more” or on their own: “more /etc/motd”, but these are some programs that are typically used as pipes.

awk      - a pattern scanning and processing language, normally works on columns
cat      - concatenate and display 
colrm    - remove characters from specified columns within each line
cut      - remove selected fields from each line of a file
dd       - convert and copy files with various data formats
dos2unix, unix2dos - convert text file between DOS & ISO formats
expand, unexpand - expand TAB characters to SPACE characters, and vice versa
fmt, fmt_mail - simple text and mail-message formatters
fold     - fold long lines for display on an output device of a given width
grep, egrep, fgrep - search a file for a string or regular expression
head     - display first few lines of specified files
join     - relational database operator
more     - browse or page through a text file
nl       - line numbering filter
od       - octal, decimal, hexadecimal, and ascii dump
paste    - join corresponding lines of several files, or subsequent lines of one file
pr       - prepare file(s) for printing, perhaps in multiple columns
rev      - reverse the order of characters in each line
sed      - stream editor
sort     - sort and collate lines
spell    - report spelling errors
tail     - display the last part of a file
tee      - save input to a file and pass it along
tr       - translate characters (use for ROT13, CRLF conversion, etc.)
tsort    - topological sort
uniq     - remove or report adjacent duplicate lines

Other Text Processing Commands

look   - find words in the system dictionary or lines in a sorted list
wc     - display a count of lines, words and characters
sum    - calculate a checksum for a file
dircmp - compare directories

comm   - display lines in common, and lines not in common, between two sorted lists
cmp    - perform a byte-by-byte comparison of two files
diff   - display line-by-line differences between pairs of text files
diff3  - display line-by-line differences between 3 files
sdiff  - contrast two text files by displaying them side-by-side
patch  - the opposite of diff

ed, red         - basic line editor
ex, edit, e     - line editor
vi, view, vedit - visual display editor based on ex

split  - split a file into pieces
csplit - split a file with respect to a given context

Other Users & System Stuff

finger      - display information about users
id          - print the user name and ID, and group name and ID
last        - indicate last logins by user or terminal
mpstat      - show multi-processor usage
su          - super-user, temporarily switch to a new user ID
uptime      - show how long the system has been up
users       - display a compact list of users logged in
w           - who is logged in, and what are they doing
who         - who is logged in on the system
whoami      - display the effective current username

mesg        - permit or deny messages on the terminal
talk        - talk to another user
write       - write a message to another user

arch        - display the architecture of the current host 
mach        - display the processor type of the current host
hostid      - print the numeric identifier of the current host
hostname    - set or print name of current host system
uname       - display the name of the current system

groups      - display a user's group memberships
newgrp      - log into a new primary group
passwd      - change your password, finger name, and/or shell (also chfn, chsh)

stty        - set or alter the options for a terminal
tset, reset - establish or restore terminal characteristics

tty         - display the name of the terminal
xenv        - obtain or alter environment variables for command execution

Pine and Pico

You’ve probably used the mail program pine. If you know how to use older mail programs like Mail, you probably don’t need these notes anyhow. If you haven’t used it, it’s a nice menu-driven program for reading and sending electronic mail. You can learn to use it by reading the list of available commands that always appears at the bottom of the screen.

Pine’s built in text editor is called pico, and can be used on its own. To edit a file named bar, just type “pico bar”. It’s not very powerful, but it’s easy to use. You could use it for everything, but I recommend learning a stronger editor like vi or emacs.

Other mail-related programs:

from       - display the sender and date of newly-arrived mail messages
biff       - give notice of incoming mail messages
mail, Mail - read or send mail messages

Printing

Printers available for graduate students use are:

hp1 in 708C Evans
hp3 in 838 Evans
hp4 in 1002 Evans

They can be called by the alternate names hp1s, hp3s, and hp4s for single-sided output.

To print a file, use the lpr command with the -P flag to specify the printer: “lpr -Php3 document”. You can print either text or PostScript files this way. To see files waiting for the printer, use “lpq -Php3” and to remove one of your files from the queue use “lprm -Php3 tinsel” (assuming you’re me).

If the print queue does not appear to be moving and nothing’s coming out of the printer, there are a few things to try. If the LED is flashing, it’s probably just working on a complicated job. Otherwise, check if the printer is offline, or if there’s an error message on the printer’s console. If these aren’t the problem, try “lwrestart hp3” (no -P).

We have quotas on printing, too. To see how much you’ve printed, use the paper command.

TRANSCRIPT utilities include:

psnup    - print multiple pages on a sheet of paper.
enscript - convert text files to POSTSCRIPT format for printing

Disk Space, Archives, Compression

Remember df and du.

cpio  - copy file archives in and out
compress, uncompress, zcat - compress or expand files, display expanded contents
crypt - encode or decode a file
des   - encrypt or decrypt data using Data Encryption Standard
tar   - create tape archives, and add or extract files

Late every night, the math department computers check every file on the system, and total their sizes by file owner. You can use dqstatus to check how much disk space you’re using system-wide, and how this compares to your quota. If you go over quota, you will be allow only five logins. You will be warned when you log in, and must delete or compress files to save space. When the system checks disk usage on the next night, the login restriction will go away.

More Stuff

banner   - display a string in large letters
bc       - arbitrary-precision arithmetic language
cal      - display a calendar
calendar - a simple reminder service
date     - display or set the date
dc       - desk calculator
leave    - remind you when you have to leave
script   - make typescript of a terminal session
sleep    - suspend execution for a specified interval
time     - time a command
units    - conversion program
vacation - reply to mail automatically
wait     - wait for a process to finish

GNU

less
gzip

Standard Extras

uuencode, sc, MIME, etc.

Disk Drive

eject     - eject media device from drive
fdformat  - format diskettes for use with SunOS
mtools
volcheck

We’ve omitted certain types of commands from this list, with the intention of dealing with them later, including anything to do with the X Window System, writing and compiling programs, networking, and text formatting packages like nroff and troff. We’ll come back to some of these later.

VI. Notes

  1. Processes have both real and effective owners and groups. In networked environments, the information that would be in the password and group files is distributed via NIS.
  2. Like users, files are referenced by number, the names are stored in the directory. Files can therefore appear in more than one place, under more than one name. If you really want to know more, read about hard links.
  3. I’ve ignored the sticky bit and setuid/setgid bits, and I really should explain how these all are stored as a four digit octal number and talk about setting your umask, the binary inverse of permissions allowed.
  4. Other special files are device files, UNIX-domain sockets and named pipes, both used for communication between processes.
  5. There’s plenty more where this comes from. For example, you can see all files that start with lower case letters with “ls [a-z]*”. Just read the man page for csh.
  6. You might expect backslash to help with certain problems like deleting a file named -r The problem is that programs interpret their own flags, not the shell, and so they see the filename as a flag. The easiest way to deal with these files is “rm ./-r”.
  7. To send error messages as well, use >&, >>& or |&. To send only error messages is possible but complicated.