Shells
Processes
COP-3402
Table of Contents
Command-line interpreter (shell)
Executes commands
ls ./
Commands
- String input
- Program name and arguments, e.g.,
ls ./
- Built-in commands, e.g.,
cd
orexit
Our project will only need to run programs and do some limited piping and redirection.
Command loop
Prompt for another command after execution
do commands = parse(userinput) run(commands) while command != exit
Note that our project will only require executing a single line of commands and will not need to loop.
Processing commands
Command and arguments
- Identify command and arguments
(Diagram)
ls ./ ../
In reality, bash and other command-line interpreters process a full programming language that has structured constructs. We will look at parsing when we talk about compilers.
Redirection example
Diagram
$ ls >out.txt
Identify the program and redirections.
Piping and redirection
Diagram
ls / | wc
Identify the program, redirections, and piping.
Parsing commands
Reading the input
What symbols do we have for programs, arguments, redirection, and piping?
Text | Meaning |
---|---|
">" | Redirect out |
"<" | Redirect in |
"|" | Pipe |
STRING | Path to a program or its arguments |
"\n" | End of command |
STRING is a placeholder for any other string of characters
What about whitespace?
Not part of symbols. Delineates arguments.
Strictly speaking, spaces can be used for program names or arguments, but must be escaped or wrapped in quotes.
Shell command grammar
start ::= pipes "\n" pipes ::= command pipe* pipe ::= "|" command command ::= program in? out? program :: STRING STRING* in ::= "<" STRING out ::= ">" STRING
- lowercase words are placeholder symbols for substrings of a command input
- ::= rules define how those input parts are constructed
- * means 0 or mean repetitions of that symbol
- ? means it either appears once or not at all (0 or 1).
Understanding the grammar: example 1
command ::= program in? out?
command
is constructed from a program symbol followed (optionally) by redirect in/out.
Understanding the grammar: example 2
program :: STRING STRING*
program
is constructed out of a STRING (for the program name) followed by 0 or more other STRINGS (the arguments).
Parser pseudo-code
consume one token match start match pipes match command match program consume one or more STRINGs match in? if next token is "<", consume it and a STRING token match out? if next token is ">", consume it and a STRING token match pipe* if next is "|", consume it and repeat matching a command match "\n"
- How can we "consume one or more STRINGs"? One way is with a while loop (loop until the next token is not a STRING).
- How can we "repeat matching a command"? One way is to put a statement label on the
match command
code and usegoto
to branch back to it. Alternatively, wrap thematch command
in a loop that continues until we no longer see a pipe symbol, then continue to finish matchingstart
's final "\n" token.
Implementing parsing
The lexer
The lexer breaks the input up into tokens
It is provided for you (lex.l
) in the code template.
yylex()
calls the lexer and returns one token.
The lexeme (string of the token) is stored in yytext.
Token types
Grammar Symbol (Terminal) | Token enum |
---|---|
string | STRING |
"|" | PIPE |
"<" | REDIRECT_IN |
">" | REDIRECT_OUT |
"\n" | END_OF_LINE |
Key variables
The template provides a consume()
function that reads from the lexer.
Global variable | Description |
---|---|
lookahead | The next token in the input |
lexeme | The text of the token |
Example input
ls > ls.out | wc
What are the tokens and the lexemes?
Token | Lexeme |
---|---|
STRING | "ls" |
REDIRECT_OUT | ">" |
STRING | "ls.out" |
PIPE | "|" |
STRING | "wc" |
Reading in tokens
consume()
reads one token and its lexeme
Matching tokens
We provide two functions to help match tokens.
Function | Description |
---|---|
ensure() | Check that two tokens match or fail |
match() | Check that two tokens match and return true or false. |
- ensure() is useful when only one possible token can come next, e.g., the first STRING or the last "\n".
- match() is useful when there are multiple possible next tokens, e.g., ">" is optional.
Partial implementation of the recognizer
void parse() { consume(); // TODO: validate start and pipes command: ensure(lookahead, STRING); program: ensure(lookahead, STRING); consume(); // TODO: collect arguments as well // TODO: parse redirect in out: if (match(lookahead, REDIRECT_OUT)) { consume(); ensure(lookahead, STRING); consume(); } // TODO: parse pipes ensure(lookahead, END_OF_LINE); }
Walk through using consume as well as the provided ensure() and match() functions.
What is the output?
This parser outputs nothing. It validates that the input is grammatical.
This is typically called a recognizer
for a language. A parser
recognizes a language and builds a data structure, e.g., a parse/syntax tree, for the input.
Creating the command structures
The template provides a linked list data structure command_t
:
typedef struct command_s { char *argv[MAX_ARGS]; // NULL-terminated list. char *in; // NULL if there is no redirect char *out; // NULL if there is no redirect struct command_s *next; } command_t;
Explain each field
Helper functions
Function | Description |
---|---|
add_command() |
Allocate a new struct and append to list |
print_commands() |
Pretty print the command output |
List variables
Global Variable | Description |
---|---|
first_command |
First element in the list (or NULL when empty) |
last_command |
Last element in the list (or NULL when empty) |
Partial implementation of the parser
void parse() { consume(); // TODO: validate start and pipes command: ensure(lookahead, STRING); command_t *command = add_command(); // create new command list entry program: ensure(lookahead, STRING); command->argv[0] = lexeme; // populate data in the list entry consume(); // TODO: collect arguments as well // TODO: parse redirect in out: if (match(lookahead, REDIRECT_OUT)) { consume(); ensure(lookahead, STRING); command->out = lexeme; consume(); } // TODO: parse pipes ensure(lookahead, END_OF_LINE); // Be sure to remove any debugging output to stdout before submitting // print_commands(); }
Note that the add_command()
function already handles linked list construction for you.
Debugging
Use the provided print_command()
function to pretty-print the command input:
print_commands();
Note how differences in space are replaced with a single space bewteen everything.
Note that even if there is no space between tokens, parsing still works and the pretty-printer adds space.
Examples
eustis:~/cop3402fall25/proc$ ./mysh ls>ls.out ls > ls.out
eustis:~/cop3402fall25/proc$ ./mysh ls >ls.out ls > ls.out
eustis:~/cop3402fall25/proc$ ./mysh ls> ls.out ls > ls.out
Making testing faster
Use echo
to pass a command to your program, e.g.,
echo "ls > ls.out" | ./mysh
Why does this work?
This avoids having to type each test in by hand every time.
Saving your tests to test scripts
emacs mytest_01.sh
- Write
echo "ls > ls.out" | ./mysh
- Save with Ctrl-x Ctrl-s. Quit with Ctrl-x Ctrl-c.
- Run your test with
bash mytest_01.sh
Don't overwrite your tests! Create a new file instead. That way you don't have to rethink create old tests, especially if you have a regression.
Interpreting commands
- Parser provides simplified in-memory data structure
- Interpreter walks the list and executes commands
- Ascribes meaning to symbols (command, pipe, redirect, etc.)
Interpreter pseudo-code
if one command: fork: if in child: if redirect in: open file and redirect stdin from it if redirect out open file and redirect stdout to it exec command with args if in parent wait for child if two commands: create pipe fork: if in child: redirect stdout to pipe in if redirect in: open file and redirect stdin from it exec command with args fork: if in child: redirect stdin from pipe ou if redirect out: open file and redirect stdou to it exec command with args close pipe in and out in parent wait for both children
Arbitrary number of commands
lastpipe = stdin do command = get next command if command is not the last create a pipe in nextpipefd else if the last command nextpipefd = stdout run_child(command, lastpipefd, nextpipefd) if a pipe was created, close lastpipefd nextpipefd = lastpipefd while there are still commands wait for all children run_child(command, lastpipefd, nextpipefd): fork: if in child: redirect stdin from lastpipefd redirect stdout to nextpipefd if redirect in: open file and redirect stdin from it if redirect out open file and redirect stdout to it exec command with args
Executing programs
What syscalls can we use for this?
fork and exec
execvp.c
#include <stdio.h> #include <unistd.h> // fork(), execvp() #include <stdlib.h> // exit() #include <inttypes.h> // intmax_t // man 2 execvp int main(int argc, char **argv) { char *prog = "ls"; // including the NULL terminator and the progname const int MAXARGV = 32 + 1 + 1; char **newargv = malloc(sizeof(char *) * (MAXARGV)); newargv[0] = prog; newargv[1] = "./"; newargv[2] = "../"; newargv[3] = NULL; // NULL-terminate the argv list execvp(prog, newargv); }
Partial interpreter implementation
void run_commands() { if (NULL != first_command && first_command == last_command) { // one command in the list command_t *command = first_command; pid_t pid; switch (pid = fork()) { case -1: perror("fork"); exit(EXIT_FAILURE); break; case 0: // child if (NULL != command->out) { int outfd = open(command->out, O_RDWR | O_TRUNC | O_CREAT, S_IRUSR | S_IWUSR); if (-1 == outfd) { perror("open"); exit(EXIT_FAILURE); } if (-1 == dup2(outfd, STDOUT_FILENO)) { perror("dup2"); exit(EXIT_FAILURE); } if (-1 == close(outfd)) { perror("close"); exit(EXIT_FAILURE); } } execvp(command->argv[0], command->argv); perror("execvp"); _exit(EXIT_FAILURE); break; default: // parent // nothing to do here. will wait for all children once all child // processes are created. break; } // wait for children to exit do { while (wait(NULL) > 0); } while (errno != ECHILD); } }