CS/CE 218 Lecture -*-Outline-*- for text-filters
note: when I taught from this, I didn't draw the diagrams, except the first one
Doing so would help the students, but would take more time....
connection: we have finished the introductory section of the course.
we learned about input to unix, how to invoke commands,
how to find out information about specific commands,
and about files and directories, the basic data manipulated by unix.
Thus far we've been looking at ways that you, a human,
can create, modify, and observe directories and files.
But the real strength of Unix is that it amplifies our feeble fingers
by letting us manipulate directories and files under the control
of our programs. Manipulation under program control is the topic
of our next large section of the course: Shell programming.
* Unix text filters and pipelines
advert: Shell programming is not only useful in and of itself,
it's also the medium through we'll describe the
basic conventions for Unix programs: "shape" of a unix process,
powerful standard way to construct software, very influential
helps make tools easy to combine
Similar ideas in MS-DOS,... (copied from Unix) but not applied
as consistently
as a demonstration below, use a file containing the following
#! /bin/sh
echo "stdout"
echo "stderr" 1>&2
** plumbing connections of a process
a process is a program that is running (has space, takes up CPU)
Q: If a process were a piece of plumbing, what shape would it have?
/dev/tty
^
| 2=stderr
======== | =======
/dev/tty ---> --> /dev/tty
0=stdin ================== 1=stdout
a process in Unix can be viewed as a pipe, more accurately as a T
it reads from one standard file, writes another,
and sends errors to a third
Q: What are the standard files that a process uses?
diagnostic output often called standard error
does a program have to follow these conventions?
no, but most do
general exceptions are ``display'' programs (emacs)
Q: Why do you think diagnostic output is distinguished from regular
output in Unix?
to carry the plumbing analogy further, water comes from a spring, etc.
eventually goes to a waste treatment plant...
What are some sources of input in Unix?
terminal also has a file name: /dev/tty
What are some sinks for output?
the ultimate sink is /dev/null (like NUL in MS-DOS)
normal connections of a process run by the shell...
------------
$ testio
stdout
stderr
------------
/dev/tty
^
|
| 2=stderr
0=stdin ======== | ======= 1=stdout
/dev/tty ---> testio --> /dev/tty
==================
*** redirection
To carry out the plumbing analogy,
we should be able to divert the water (data)
into "buckets" (files)
or to pump it out of other buckets
it can be redirected to other files
e.g., sed -e 's/dis//g' output
copies the file input to output, changing disorder to order
**** output
As in MS-DOS
Q: Where does the standard output go by default?
to redirect the standard output, you use > and then a pathname...
Q: What does the character that is used to redirect standard output
mean to you pictorially? Is it well-chosen?
-------------
$ testio >junk
stderr
$ cat junk
stdout
-------------
picture of execution of testio >junk
/dev/tty
^
| 2=stderr
======== | =======
/dev/tty --> testio --> junk
0=stdin ================== 1=stdout
when you say foo >bar, bar is created as an empty file,
this means that if bar exists it's emptied, then a new one...
Q: Do you agree with the way the shell handles redirection of
output to an existing file?
To add to the end of an existing file instead, you use >>
Q: In what situations would you want to use >> instead of >?
**** input
Q: Where does the standard input come from by default?
You may need to tell a program you don't want to talk to it anymore.
What character indicates end-of-file (by default)?
^D (not ^Z as in MS-DOS)
redirection of input is similar to redirection of output,
except the sign is different (hence < instead of >)
and it hardly makes any sense to create the input file
picture of execution of foo foo --> /dev/tty
0=stdin ================== 1=stdout
It's possible (as in the example) to use both input and output
redirection in the same command.
Q: Which is done first: creation of files for output redirection
or reading input files?
so what happens if you say sort bar?
redirection has important implications for programmers:
prompting: stdin might not be a person at a keyboard
should test to see if stdin is a terminal
prompts sent to stdout or stderr might not be seen!
so prompts should be sent to /dev/tty
**** diagnostic output (section 4.7)
where does stderr go by default?
like standard output, can also redirect stderr in the shell
Note: an advantage of the bourne shell
you can't easily redirect stderr in the cshell!
it only allows you to send both stderr and stdout to same place
Q: How do you send error messages to a file instead of the terminal?
2> errfile
This is curious, can it be used in general?
yes, 1> f is like >f
0errs
stdout
$ cat errs
stderr
---------------
picture of execution of testio 2>errs
errs
^
| 2=stderr
======== | =======
/dev/tty --> testio --> /dev/tty
0=stdin ================== 1=stdout
-------------
$ testio 2>errs 1>junk
$ cat junk
stdout
$ cat errs
stderr
$
-------------
*** duplication of file descriptors (&1, &2, ...)
the file descriptors of a child process are inherited from the shell
can treat a file descriptor as a file for redirection
e.g., 2>&1 makes 2 go to a "duplicate" of fd 1, i.e., stdout
1>&2 makes stdout go to the error stream
Q: If "echo" sends output to standard output, how would you use it
to send an error message to the diagnostic output?
echo "$errmsg" 1>&2
or simply echo "$errmsg" >&2
----------------
$ testio 1>&2
stdout
stderr
$ testio 2>&1
stdout
stderr
----------------
picture of the execution of testio 1>&2
/dev/tty
^
/|-------------
/ | \
| | 2=stderr |
0=stdin ======== | ======= 1=stdout
/dev/tty ---> sh --> /dev/tty
| =================== |
| | /
| \ /
| \ /
| \ /
| | 2=stderr |
| ======== | ======= |
-> testio --
0=stdin ================== 1=stdout
picture of the execution of testio 2>&1
/dev/tty
^
|
| 2=stderr
0=stdin ======== | ======= 1=stdout
/dev/tty ---> sh --> /dev/tty
| ================== /|
| / |
| ------ |
| / |
| / |
| | 2=stderr |
| ======== | ======= |
-> testio --
0=stdin ================== 1=stdout
**** order of duplication and redirection matters
consider the following examples (>&2 is same as 1>&2)
-------------
$ testio >&2 2>errs
stdout
$ cat errs
stderr
-------------
picture of the execution of testio >&2 2>errs
/dev/tty
^
|\------------
| \
|2=stderr |
0=stdin ======== | ======= 1=stdout
/dev/tty ---> sh --> /dev/tty
| ================== |
| /
| /
| errs /
| ^ /
| | 2=stderr |
| ======== | ======= |
-> echo --
0=stdin ================== 1=stdout
-------------
$ testio 2>errs >&2
$ cat errs
stdout
stderr
-------------
picture of execution of testio 2>errs >&2
/dev/tty
^
|
| 2=stderr
0=stdin ======== | ======= 1=stdout
/dev/tty ---> sh --> /dev/tty
| ==================
|
|
| errs <--------
| ^ ^
| | 2=stderr |
| ======== | ======= |
-> testio --
0=stdin ================== 1=stdout
-------------
$ testio 2>&1 >junk
stderr
$ cat junk
stdout
$ testio >junk 2>&1
$ cat junk
stdout
stderr
-------------
picture of the execution of testio 2>&1 >junk
/dev/tty
^
|
|
| 2=stderr
0=stdin ======== | ======= 1=stdout
/dev/tty ---> sh --> /dev/tty
| ================== /
| /
| --->--
| /
| /
| | 2=stderr
| ======== | =======
-> testio --> junk
0=stdin ================== 1=stdout
picture of execution of testio >junk 2>&1
/dev/tty
^
|
|
| 2=stderr
0=stdin ======== | ======= 1=stdout
/dev/tty ---> sh --> /dev/tty
| ==================
|
|
| ---------->-|
| ^ |
| | 2=stderr |
| ======== | ======= |
-> testio --> junk
0=stdin ================== 1=stdout
** pipelines (section 4.5, 4.6)
plumbing is most useful to transport water over long distances,
without having to put it in buckets
similarly, "pipelines" can put data through its paces without
having to store the intermediate results in buckets
*** examples
-----------------------
echo iron copper | sed -e 's/.*/gold/'
echo these are different args | tr ' ' '\012' | sort
find . -type f -mtime $N -print | sed -e 's!\./!!'
-----------------------
last finds files modified in last N days,
the sed takes the ./ out of their names
-----------------------
deroff -w <$f | sort -u +0 | \
/usr/lib/spell/spellprog /usr/lib/spell/hstop 1 | \
/usr/lib/spell/spellprog /usr/lib/spell/hlista /dev/null |\
sed '/^\./d' | sort -u +1f +0
-----------------------
spell program basically like this;
$f filtered into words, then sorted, then checked
for various kinds of matches, then .'s are deleted
then output is sorted
easier to sort twice than to get programs to produce
sorted output
*** principles of pipelining
Q: What are some of the advantages of using pipelines instead
of storing intermediate results in temporary files?
don't have to name and remove temporary files
parallel processing (overlap between I/O and computing)
may be really parallel on a multicomputer
a "piece of a pipeline" is an important thing in Unix,
so it has to have a name...
Q: What is a "step of a pipeline" called?
Can all commands be used as filters?
no, it's possible to write one's that can't be.
but not good style! Should always be able to use as filter
if possible
If a command can take one file argument, it should recognize no args
as "standard input"
If a command can sensibly write to standard output, it should
do so by default
*** more intricate plumbing with tee
to carry the analogy of pipelines further,
it would be nice to be able to split a stream
and do several things with it (in parallel)
the shell doesn't allow this in full generality,
but the command tee
allows you to capture intermediate stages of a pipeline
Q: How would you make a script of your terminal session using tee?
tee inputs | ksh -i 2>&1 | tee outputs
Q: How else could you do the job that tee does without using it?
one can imagine a graphical interface that allows more interesting
plumbing to be set up...