Shells
Processes
COP-3402

Command-line interpreter (shell)
Processing commands
Parsing commands
Implementing parsing
Interpreting commands

Command-line interpreter (shell)

Executes commands
```
ls ./
```

Commands

String input
Program name and arguments, e.g., ls ./
Built-in commands, e.g., cd or exit

Our project will only need to run programs and do some limited piping and redirection.

Command loop

Prompt for another command after execution

do
  commands = parse(userinput)
  run(commands)
while command != exit

Note that our project will only require executing a single line of commands and will not need to loop.

Processing commands

Command and arguments

Identify command and arguments
(Diagram)
```
ls ./ ../
```

In reality, bash and other command-line interpreters process a full programming language that has structured constructs. We will look at parsing when we talk about compilers.

Redirection example

Diagram

$ ls >out.txt

Identify the program and redirections.

Piping and redirection

Diagram

ls / | wc

Identify the program, redirections, and piping.

Parsing commands

Reading the input

What symbols do we have for programs, arguments, redirection, and piping?

Text	Meaning
">"	Redirect out
"<"	Redirect in
"\|"	Pipe
STRING	Path to a program or its arguments
"\n"	End of command

STRING is a placeholder for any other string of characters

What about whitespace?

Not part of symbols. Delineates arguments.

Strictly speaking, spaces can be used for program names or arguments, but must be escaped or wrapped in quotes.

Shell command grammar

start     ::= pipes "\n"
pipes     ::= command pipe*
pipe      ::= "|" command
command   ::= program in? out?
program   :: STRING STRING*
in        ::= "<" STRING
out       ::= ">" STRING

lowercase words are placeholder symbols for substrings of a command input
::= rules define how those input parts are constructed
* means 0 or mean repetitions of that symbol
? means it either appears once or not at all (0 or 1).

Understanding the grammar: example 1

command   ::= program in? out?

command is constructed from a program symbol followed (optionally) by redirect in/out.

Understanding the grammar: example 2

program   :: STRING STRING*

program is constructed out of a STRING (for the program name) followed by 0 or more other STRINGS (the arguments).

Parser pseudo-code

consume one token
match start
  match pipes
    match command
      match program
        consume one or more STRINGs
      match in?
        if next token is "<", consume it and a STRING token
      match out?
        if next token is ">", consume it and a STRING token
    match pipe*
      if next is "|", consume it and repeat matching a command
  match "\n"

How can we "consume one or more STRINGs"? One way is with a while loop (loop until the next token is not a STRING).
How can we "repeat matching a command"? One way is to put a statement label on the match command code and use goto to branch back to it. Alternatively, wrap the match command in a loop that continues until we no longer see a pipe symbol, then continue to finish matching start's final "\n" token.

Implementing parsing

The lexer

The lexer breaks the input up into tokens

It is provided for you (lex.l) in the code template.

yylex() calls the lexer and returns one token.

The lexeme (string of the token) is stored in yytext.

Token types

Grammar Symbol (Terminal)	Token enum
string	STRING
"\|"	PIPE
"<"	REDIRECT_IN
">"	REDIRECT_OUT
"\n"	END_OF_LINE

Key variables

The template provides a consume() function that reads from the lexer.

Global variable	Description
lookahead	The next token in the input
lexeme	The text of the token

Example input

ls > ls.out | wc

What are the tokens and the lexemes?

Token	Lexeme
STRING	"ls"
REDIRECT_OUT	">"
STRING	"ls.out"
PIPE	"\|"
STRING	"wc"

Reading in tokens

consume() reads one token and its lexeme

Matching tokens

We provide two functions to help match tokens.

Function	Description
ensure()	Check that two tokens match or fail
match()	Check that two tokens match and return true or false.

ensure() is useful when only one possible token can come next, e.g., the first STRING or the last "\n".
match() is useful when there are multiple possible next tokens, e.g., ">" is optional.

Partial implementation of the recognizer

void parse() {
  consume();

  // TODO: validate start and pipes

 command:
  ensure(lookahead, STRING);

 program:
  ensure(lookahead, STRING);
    consume();
  // TODO: collect arguments as well

 // TODO: parse redirect in

 out:
  if (match(lookahead, REDIRECT_OUT)) {
    consume();
    ensure(lookahead, STRING);
      consume();
  }

  // TODO: parse pipes

  ensure(lookahead, END_OF_LINE);
}

Walk through using consume as well as the provided ensure() and match() functions.

What is the output?

This parser outputs nothing. It validates that the input is grammatical.

This is typically called a recognizer for a language. A parser recognizes a language and builds a data structure, e.g., a parse/syntax tree, for the input.

Creating the command structures

The template provides a linked list data structure command_t:

typedef struct command_s {
  char *argv[MAX_ARGS];  // NULL-terminated list.
  char *in;  // NULL if there is no redirect
  char *out;  // NULL if there is no redirect
  struct command_s *next;
} command_t;

Explain each field

Helper functions

Function	Description
`add_command()`	Allocate a new struct and append to list
`print_commands()`	Pretty print the command output

List variables

Global Variable	Description
`first_command`	First element in the list (or NULL when empty)
`last_command`	Last element in the list (or NULL when empty)

Partial implementation of the parser

void parse() {
  consume();

  // TODO: validate start and pipes

 command:
  ensure(lookahead, STRING);
  command_t *command = add_command();  // create new command list entry

 program:
  ensure(lookahead, STRING);
  command->argv[0] = lexeme;  // populate data in the list entry
  consume();
  // TODO: collect arguments as well

 // TODO: parse redirect in

 out:
  if (match(lookahead, REDIRECT_OUT)) {
    consume();
    ensure(lookahead, STRING);
    command->out = lexeme;
    consume();
  }

  // TODO: parse pipes

  ensure(lookahead, END_OF_LINE);

  // Be sure to remove any debugging output to stdout before submitting
  // print_commands();
}

Note that the add_command() function already handles linked list construction for you.

Debugging

Use the provided print_command() function to pretty-print the command input:

print_commands();

Note how differences in space are replaced with a single space bewteen everything.

Note that even if there is no space between tokens, parsing still works and the pretty-printer adds space.

Examples

eustis:~/cop3402fall25/proc$ ./mysh 
ls>ls.out
ls > ls.out

eustis:~/cop3402fall25/proc$ ./mysh 
ls >ls.out
ls > ls.out

eustis:~/cop3402fall25/proc$ ./mysh 
ls>    ls.out
ls > ls.out

Making testing faster

Use echo to pass a command to your program, e.g.,

echo "ls > ls.out" | ./mysh

Why does this work?

This avoids having to type each test in by hand every time.

Saving your tests to test scripts

emacs mytest_01.sh
Write echo "ls > ls.out" | ./mysh
Save with Ctrl-x Ctrl-s. Quit with Ctrl-x Ctrl-c.
Run your test with bash mytest_01.sh

Don't overwrite your tests! Create a new file instead. That way you don't have to rethink create old tests, especially if you have a regression.

Interpreting commands

Parser provides simplified in-memory data structure
Interpreter walks the list and executes commands
Ascribes meaning to symbols (command, pipe, redirect, etc.)

Interpreter pseudo-code

if one command:
  fork:
    if in child:
      if redirect in:
        open file and redirect stdin from it
      if redirect out
        open file and redirect stdout to it
      exec command with args
    if in parent
      wait for child
if two commands:
  create pipe
  fork:
    if in child:
      redirect stdout to pipe in
      if redirect in:
        open file and redirect stdin from it
      exec command with args
  fork: 
    if in child:
      redirect stdin from pipe ou
      if redirect out:
        open file and redirect stdou to it
      exec command with args
  close pipe in and out in parent    
  wait for both children

Arbitrary number of commands

lastpipe = stdin
do 
  command = get next command
  if command is not the last
    create a pipe in nextpipefd
  else if the last command
    nextpipefd = stdout
  run_child(command, lastpipefd, nextpipefd)
  if a pipe was created, close lastpipefd
  nextpipefd = lastpipefd
while there are still commands
wait for all children

run_child(command, lastpipefd, nextpipefd):
  fork:
    if in child:
      redirect stdin from lastpipefd
      redirect stdout to nextpipefd
      if redirect in:
        open file and redirect stdin from it
      if redirect out
        open file and redirect stdout to it
      exec command with args

Executing programs

What syscalls can we use for this?

fork and exec

`execvp.c`

#include <stdio.h>
#include <unistd.h>    // fork(), execvp()
#include <stdlib.h>    // exit()
#include <inttypes.h>  // intmax_t

// man 2 execvp

int main(int argc, char **argv) {
  char *prog = "ls";
  // including the NULL terminator and the progname
  const int MAXARGV = 32 + 1 + 1;
  char **newargv = malloc(sizeof(char *) * (MAXARGV));
  newargv[0] = prog;
  newargv[1] = "./";
  newargv[2] = "../";
  newargv[3] = NULL;  // NULL-terminate the argv list
  execvp(prog, newargv);
}

Partial interpreter implementation

void run_commands() {
  if (NULL != first_command && first_command == last_command) {
    // one command in the list

    command_t *command = first_command;

    pid_t pid;
    switch (pid = fork()) {
    case -1:
      perror("fork");
      exit(EXIT_FAILURE);
      break;
    case 0: // child

      if (NULL != command->out) {
        int outfd = open(command->out, O_RDWR | O_TRUNC | O_CREAT, S_IRUSR | S_IWUSR);
        if (-1 == outfd) {
          perror("open");
          exit(EXIT_FAILURE);
        }
        if (-1 == dup2(outfd, STDOUT_FILENO)) {
          perror("dup2");
          exit(EXIT_FAILURE);
        }
        if (-1 == close(outfd)) {
          perror("close");
          exit(EXIT_FAILURE);
        }
      }

      execvp(command->argv[0], command->argv);
      perror("execvp");
      _exit(EXIT_FAILURE);
      break;
    default: // parent
      // nothing to do here.  will wait for all children once all child
      // processes are created.
      break;
    }

    // wait for children to exit
    do {
      while (wait(NULL) > 0);
    } while (errno != ECHILD);
  }
}

Shells Processes COP-3402

Table of Contents

Command-line interpreter (shell)

Commands

Command loop

Processing commands

Command and arguments

Redirection example

Diagram

Piping and redirection

Diagram

Parsing commands

Reading the input

What about whitespace?

Shell command grammar

Understanding the grammar: example 1

Understanding the grammar: example 2

Parser pseudo-code

Implementing parsing

The lexer

Token types

Key variables

Example input

Reading in tokens

Matching tokens

Partial implementation of the recognizer

What is the output?

Creating the command structures

Helper functions

List variables

Partial implementation of the parser

Debugging

Examples

Making testing faster

Saving your tests to test scripts

Interpreting commands

Interpreter pseudo-code

Executing programs

execvp.c

Partial interpreter implementation

Shells
Processes
COP-3402

`execvp.c`