Shells
Processes
COP-3402
Table of Contents
Command-line interpreter (shell)
Executes commands
ls ./
Commands
- String input
- Program name and arguments, e.g.,
ls ./ - Built-in commands, e.g.,
cdorexit
Our project will only need to run programs and do some limited piping and redirection.
Command loop
Prompt for another command after execution
do commands = parse(userinput) run(commands) while command != exit
Note that our project will only require executing a single line of commands and will not need to loop.
Processing commands
Command and arguments
- Identify command and arguments
(Diagram)
ls ./ ../
In reality, bash and other command-line interpreters process a full programming language that has structured constructs. We will look at parsing when we talk about compilers.
Redirection example
Diagram
$ ls >out.txt
Identify the program and redirections.
Piping and redirection
Diagram
ls / | wc
Identify the program, redirections, and piping.
Parsing commands
Reading the input
What symbols do we have for programs, arguments, redirection, and piping?
| Text | Meaning |
|---|---|
| ">" | Redirect out |
| "<" | Redirect in |
| "|" | Pipe |
| STRING | Path to a program or its arguments |
| "\n" | End of command |
STRING is a placeholder for any other string of characters
What about whitespace?
Not part of symbols. Delineates arguments.
Strictly speaking, spaces can be used for program names or arguments, but must be escaped or wrapped in quotes.
Shell command grammar
start ::= pipes "\n" pipes ::= command pipe* pipe ::= "|" command command ::= program in? out? program :: STRING STRING* in ::= "<" STRING out ::= ">" STRING
- lowercase words are placeholder symbols for substrings of a command input
- ::= rules define how those input parts are constructed
- * means 0 or mean repetitions of that symbol
- ? means it either appears once or not at all (0 or 1).
Understanding the grammar: example 1
command ::= program in? out?
command is constructed from a program symbol followed (optionally) by redirect in/out.
Understanding the grammar: example 2
program :: STRING STRING*
program is constructed out of a STRING (for the program name) followed by 0 or more other STRINGS (the arguments).
Parser pseudo-code
consume one token
match start
match pipes
match command
match program
consume one or more STRINGs
match in?
if next token is "<", consume it and a STRING token
match out?
if next token is ">", consume it and a STRING token
match pipe*
if next is "|", consume it and repeat matching a command
match "\n"
- How can we "consume one or more STRINGs"? One way is with a while loop (loop until the next token is not a STRING).
- How can we "repeat matching a command"? One way is to put a statement label on the
match commandcode and usegototo branch back to it. Alternatively, wrap thematch commandin a loop that continues until we no longer see a pipe symbol, then continue to finish matchingstart's final "\n" token.
Implementing parsing
The lexer
The lexer breaks the input up into tokens
It is provided for you (lex.l) in the code template.
yylex() calls the lexer and returns one token.
The lexeme (string of the token) is stored in yytext.
Token types
| Grammar Symbol (Terminal) | Token enum |
|---|---|
| string | STRING |
| "|" | PIPE |
| "<" | REDIRECT_IN |
| ">" | REDIRECT_OUT |
| "\n" | END_OF_LINE |
Key variables
The template provides a consume() function that reads from the lexer.
| Global variable | Description |
|---|---|
| lookahead | The next token in the input |
| lexeme | The text of the token |
Example input
ls > ls.out | wc
What are the tokens and the lexemes?
| Token | Lexeme |
|---|---|
| STRING | "ls" |
| REDIRECT_OUT | ">" |
| STRING | "ls.out" |
| PIPE | "|" |
| STRING | "wc" |
Reading in tokens
consume() reads one token and its lexeme
Matching tokens
We provide two functions to help match tokens.
| Function | Description |
|---|---|
| ensure() | Check that two tokens match or fail |
| match() | Check that two tokens match and return true or false. |
- ensure() is useful when only one possible token can come next, e.g., the first STRING or the last "\n".
- match() is useful when there are multiple possible next tokens, e.g., ">" is optional.
Partial implementation of the recognizer
void parse() {
consume();
// TODO: validate start and pipes
command:
ensure(lookahead, STRING);
program:
ensure(lookahead, STRING);
consume();
// TODO: collect arguments as well
// TODO: parse redirect in
out:
if (match(lookahead, REDIRECT_OUT)) {
consume();
ensure(lookahead, STRING);
consume();
}
// TODO: parse pipes
ensure(lookahead, END_OF_LINE);
}
Walk through using consume as well as the provided ensure() and match() functions.
What is the output?
This parser outputs nothing. It validates that the input is grammatical.
This is typically called a recognizer for a language. A parser recognizes a language and builds a data structure, e.g., a parse/syntax tree, for the input.
Creating the command structures
The template provides a linked list data structure command_t:
typedef struct command_s {
char *argv[MAX_ARGS]; // NULL-terminated list.
char *in; // NULL if there is no redirect
char *out; // NULL if there is no redirect
struct command_s *next;
} command_t;
Explain each field
Helper functions
| Function | Description |
|---|---|
add_command() |
Allocate a new struct and append to list |
print_commands() |
Pretty print the command output |
List variables
| Global Variable | Description |
|---|---|
first_command |
First element in the list (or NULL when empty) |
last_command |
Last element in the list (or NULL when empty) |
Partial implementation of the parser
void parse() {
consume();
// TODO: validate start and pipes
command:
ensure(lookahead, STRING);
command_t *command = add_command(); // create new command list entry
program:
ensure(lookahead, STRING);
command->argv[0] = lexeme; // populate data in the list entry
consume();
// TODO: collect arguments as well
// TODO: parse redirect in
out:
if (match(lookahead, REDIRECT_OUT)) {
consume();
ensure(lookahead, STRING);
command->out = lexeme;
consume();
}
// TODO: parse pipes
ensure(lookahead, END_OF_LINE);
// Be sure to remove any debugging output to stdout before submitting
// print_commands();
}
Note that the add_command() function already handles linked list construction for you.
Debugging
Use the provided print_command() function to pretty-print the command input:
print_commands();
Note how differences in space are replaced with a single space bewteen everything.
Note that even if there is no space between tokens, parsing still works and the pretty-printer adds space.
Examples
eustis:~/cop3402fall25/proc$ ./mysh ls>ls.out ls > ls.out
eustis:~/cop3402fall25/proc$ ./mysh ls >ls.out ls > ls.out
eustis:~/cop3402fall25/proc$ ./mysh ls> ls.out ls > ls.out
Making testing faster
Use echo to pass a command to your program, e.g.,
echo "ls > ls.out" | ./mysh
Why does this work?
This avoids having to type each test in by hand every time.
Saving your tests to test scripts
emacs mytest_01.sh- Write
echo "ls > ls.out" | ./mysh - Save with Ctrl-x Ctrl-s. Quit with Ctrl-x Ctrl-c.
- Run your test with
bash mytest_01.sh
Don't overwrite your tests! Create a new file instead. That way you don't have to rethink create old tests, especially if you have a regression.
Interpreting commands
- Parser provides simplified in-memory data structure
- Interpreter walks the list and executes commands
- Ascribes meaning to symbols (command, pipe, redirect, etc.)
Interpreter pseudo-code
if one command:
fork:
if in child:
if redirect in:
open file and redirect stdin from it
if redirect out
open file and redirect stdout to it
exec command with args
if in parent
wait for child
if two commands:
create pipe
fork:
if in child:
redirect stdout to pipe in
if redirect in:
open file and redirect stdin from it
exec command with args
fork:
if in child:
redirect stdin from pipe ou
if redirect out:
open file and redirect stdou to it
exec command with args
close pipe in and out in parent
wait for both children
Arbitrary number of commands
lastpipe = stdin
do
command = get next command
if command is not the last
create a pipe in nextpipefd
else if the last command
nextpipefd = stdout
run_child(command, lastpipefd, nextpipefd)
if a pipe was created, close lastpipefd
nextpipefd = lastpipefd
while there are still commands
wait for all children
run_child(command, lastpipefd, nextpipefd):
fork:
if in child:
redirect stdin from lastpipefd
redirect stdout to nextpipefd
if redirect in:
open file and redirect stdin from it
if redirect out
open file and redirect stdout to it
exec command with args
Executing programs
What syscalls can we use for this?
fork and exec
execvp.c
#include <stdio.h>
#include <unistd.h> // fork(), execvp()
#include <stdlib.h> // exit()
#include <inttypes.h> // intmax_t
// man 2 execvp
int main(int argc, char **argv) {
char *prog = "ls";
// including the NULL terminator and the progname
const int MAXARGV = 32 + 1 + 1;
char **newargv = malloc(sizeof(char *) * (MAXARGV));
newargv[0] = prog;
newargv[1] = "./";
newargv[2] = "../";
newargv[3] = NULL; // NULL-terminate the argv list
execvp(prog, newargv);
}
Partial interpreter implementation
void run_commands() {
if (NULL != first_command && first_command == last_command) {
// one command in the list
command_t *command = first_command;
pid_t pid;
switch (pid = fork()) {
case -1:
perror("fork");
exit(EXIT_FAILURE);
break;
case 0: // child
if (NULL != command->out) {
int outfd = open(command->out, O_RDWR | O_TRUNC | O_CREAT, S_IRUSR | S_IWUSR);
if (-1 == outfd) {
perror("open");
exit(EXIT_FAILURE);
}
if (-1 == dup2(outfd, STDOUT_FILENO)) {
perror("dup2");
exit(EXIT_FAILURE);
}
if (-1 == close(outfd)) {
perror("close");
exit(EXIT_FAILURE);
}
}
execvp(command->argv[0], command->argv);
perror("execvp");
_exit(EXIT_FAILURE);
break;
default: // parent
// nothing to do here. will wait for all children once all child
// processes are created.
break;
}
// wait for children to exit
do {
while (wait(NULL) > 0);
} while (errno != ECHILD);
}
}