Learning to Script

01 Jun 2021

I’m pretty sure I first got interested in linux around the same time Edward Snowden fled the US. I was 13 years old and captivated by how much one could determine from the metadata and other stuff the NSA was collecting. There was also some part of me that really, really hated the idea that someone out there knew what I was doing on the computer (again, I was 13, so that was mostly downloading Minecraft mods). While exploring ways to be more private online, I was introduced to the open source community and Linux.

I put Ubuntu on the first computer I built myself later that year, and after failing to figure out wireless card drivers, caved and bought Windows. The fascination with Linux remained, and today, it’s all I use on my personal computer.

So of course, when I saw a special topics course in the CS department titled “Linux Scripting”, I signed up right away. Like any Linux user I’d tried to use the terminal before, and figured out some basics, but generally avoided it where I could. And like most math majors, I was really excited to take a class where I’d learn stuff I could use on a daily basis. While no doubt run-of-the-mill for the engineers in the class, I found it really cool that everything we did was something I could reasonably see myself doing with my own machine. The assignments were wonderfully open-ended: we got to build whatever we wanted with the scripting language we had talked about that week, and record a short video talking about what we built.

The course went over half a dozen languages, including PHP, Perl, Javascript, Bash, TCL, Python, awk, and while it’s not a language, we spent a lot of time on regular expressions. Of these, I’d only written extensively in Python and by chance a few times in Bash.

To my surprise, the current maintainer of Bash, Chet Ramey works at CWRU. Including Bash under ‘My Projects’ on a personal website has got to be one of the biggest FOSS flexes I’ve ever seen, personally. He came into our class to talk about how he got involved on the project and some of the challenges of maintaining support for very old code.

Anyway, I thought it would be fun to write up some of my favorite projects from that class. Writing Bash feels like such a pure form of coding to me, in that it’s a lot of frustration and things not working until I find the exact issue or just the right way to do something, and then it all falls into place in a rush of satisfaction.

(A quick digression: a friend once pointed out to me that code will work on the first try if you write it correctly. I doubt I’ve ever been so annoyed in my life).

There are many, many definitions of what a ‘scripting language’ is, and that was a source of great discussion in class. I don’t want to attempt to retread the same ground you can find on a million different blogs and stackoverflows, so I’ll summarise here how I use scripting: either for something very simple, or very temporary.

I have some startup scripts that run every time a program is opened or I log into my computer, and those scripts do very simple things: launching a program or changing the appearance of Vim, for example. I also sometimes write one-liners to do quick-and-dirty operations, like finding uses of a particular command in a set of R files. That’s about the amount of programming/computing I do that I consider ‘scripting’. That said, I really enjoy this hacky approach to problem-solving: it sort of represents the can-do spirit I like about Linux and open source. Anyways.

Cleaning up File Names

The PI in charge of the lab I’ve been part of since freshman year tells everyone in the group, over and over, never to give a file a name with spaces in it. When I was on Windows, I always did this since it looked nicer in the Explorer window. On Linux, as well as any system with a lot of similarly named files, that’s an awful way to do things because of how annoying whitespace is to parse. On the command line, if a file or folder name has spaces, it needs to be enclosed in single quotes like so:

# the following doesn't work
cd folder with spaces
# this line does
cd 'folder with spaces'

For my first video assignment in Bash, I thought it’d be useful to have a script which removes spaces from file names and replaces them with underlines. The basic functions here (listing file names and editing strings) are very well-suited to Bash and scripting in general. I ended up running this over my entire home directory, which fixed some files I had named with spaces on accident, some oddly-named downloads, and a few crash logs from Matlab.

Here’s the code I turned in:

ls -R | while read line; do
    if [[ ${line:(-1)} == ":" ]]; then
        directory="${line::(-1)}"
    fi
    if [[ "$line" =~ [[:space:]] ]]; then
        mv "$directory/$line" "$directory/${line// /_}"
    fi
done

So much fun stuff there. The | character is a pipe, which gives the output of the first command to the following command as input. The Unix Philosophy of having many utilities that do one thing very well is so satisfying to me: it’s ‘clean’ in some way that I have a hard time describing.

By the way, I don’t intend this to be a tutorial on Bash or anything like that. If you’re curious and want to learn more, try the manual pages accessed in the terminal with man ls, for example. I also really like the How-To Geek guides on individual Linux commands, written by Dave McKay.

The double bracket notation for conditionals, combined with the parameter expansions, were my professor’s favorite part of Bash. I’ve come to appreciate them too for how powerful they are (especially that 6th line), but the notation still feels needlessly concise to me, as do so many conventions from the era when memory and storage were not cheap. These are the sorts of things that make a friend of mine call the early Unix utility devs wizards. It builds into this sense of lore that I think many people in my generation get when we play with older tools.

Finding Functions in R files

My last project for the class could have been done in Python or Javascript, but I was really enjoying getting used to Bash and I had a good problem for the language. I maintain an R package for my lab, SunsVoc, which provides a way for photovoltaics researchers to create artificial IV curves from outdoor time series data. Our lab collects IV curves from freshly built solar panels with an indoor testing machine called Suns-Voc, so the package provides a great way to compare the attributes of the panel after exposure and weathering.

R packages have relatively simple structure, including an “R” folder which contains all the package functions. A common frustration when working on these packages is having to change how one function works, then having to track down each occurence of it to see where it may cause issues. No doubt there are better ways to write the code to ensure future changes don’t create future issues, but our lab develops this code mostly by having one or two researchers prototype the code, then once a paper is published on the concept, the same researchers and others work on packaging the code for others to use. So best software engineering practices are usually of secondary concern to developing the core functions.

Anyways, I started using grep for the first time a while ago with the terminal built into RStudio in order to find the places where I use functions. The idea for the project was to grep through the “R” folder, find all the function definitions, and print their locations with all the locations they are called. My implementation is far from perfect: it misses uses of functions within apply functions, for example. That said, it satisifies a need for me, and likely wouldn’t be too difficult to make more thorough. This does sort-of break the description I wrote above about how I tend to use scripting languages, which brings up interesting questions about the best way to do this sort of task. I’m always open to suggestions!

The main source of my attraction towards Bash is building up commands bit-by-bit. For the first part of this project, I tested in the terminal each of the following snippets:

# find function declarations, with location of file and line
grep -rn '<- function(.*' | sed 's/<-.*$//'

# Find use of a function, func_name
grep -r func_name\(.*\)

# Exclude comments and examples
grep -v \#\'

# Match only nested functions
grep -rn '<- function(.*' | sed 's/<-.*$//' | sed 's/^.*:.*://' | grep \^[[:space:]]

This eventually became:

#! /usr/bin/bash

declare -a functions
readarray -t functions < <(grep -rn '<- function(.*' | sed 's/<-.*$//' | sed 's/^.*:.*://')

for i in "${functions[@]}"; do
    if [[ $i =~ \# ]]; then
          :
    elif [[ $i =~ ^[[:space:]] ]]; then
        func="$(echo "$i" | sed 's/^[[:space:]]*//')"
        printf "Nested Function: %s\n" $func
        unset func
    else
        printf "Function: %s\n" $i
    fi
done

unset IFS

I wasn’t happy with using the array, though, since it made the code more verbose than it needed to be. After fiddling with delimiters, I got to this:

#! /usr/bin/bash

IFS=$'\n'

#grep -rn '<- function(.*' | sed 's/<-.*$//' | sed 's/^.*:.*://' | while read i; do
for i in $(grep -rn '<- function(.*' | sed 's/<-.*$//' | sed 's/^.*:.*://'); do
    if [[ $i =~ \# ]]; then
          :
    elif [[ $i =~ ^[[:space:]] ]]; then
        func="$(echo "$i" | sed 's/^[[:space:]]*//')"
        printf "Nested Function: %s\n" $func
        unset func
    else
        printf "Function: %s\n" $i
    fi
done

unset IFS

I just had to set the IFS (interfield seperator) to newline characters to get the same behavior from a for loop as from readarray. To me, this method is just as easy to read and simpler to troubleshoot, since it cuts down on the number of unique functions used.

Next, I wanted to find all the occasions of a function being used:

#!/usr/bin/bash

# Input should be the name of the function you're looking for
func=$1
# Separate the array items below by newlines, not spaces
IFS=$'\n'

# Grep for that function name (being used as a function)

for i in $(grep -rn $func\( | grep -v ^.*\#); do
    file="$(echo $i | grep -o ^.*R)"
    line="$(echo $i | grep -Eo :[0-9]+*: | sed 's/://g')"
    printf "Used on line %s in file %s\n" $line $file
done
unset IFS

I remember hating all the time we spent manipulating strings in my first comp sci class in high school. I’ve come to appreciate it a lot more now that I’m doing it with a purpose in mind - arbitrary problems like “remove everything before the first occasion of this substring” are both easy to do with Bash and much more satisfying when they get you what you want.

The only step remaining was to combine them together:

#!/usr/bin/bash

# Get a function, then print the uses of that function

IFS=$'\n' 
for i in $(source ~/Documents/CSDS397/video3/find_funcs.sh);
do
    # First, print the name of the function we'll be examining
    printf "$i\n"
    # Now, get that $i down to just the function name
    func="$(echo $i | grep -o ": .*$")"
    func=${func#* }
    func=$(echo $func | sed 's/ //g')

    output=$(source ~/Documents/CSDS397/video3/find_func_uses.sh $func; echo x)
    output=${output%?}
    printf "$output%s" ''
done

This one was interesting for the printing challenges. I’m not sure that I have the most concise form of this code, but I landed here after playing with trailing whitespace and adding newlines and so on for an hour or two. This was definitely the most tedious part of developing the script, since I was so close to done but still dealing with Bash quirks.

Concluding Thoughts

I still use the three scripts I wrote for finding R functions all the time. They were immediately useful for finding some small problems, like a nested function that was defined identically in two different functions or a few deprecated functions I hadn’t yet removed. What’s most helpful is having the greps and seds on hand to quickly get what I want instead of having to come up with a new one-liner each time.

Since the semester ended, I’ve found myself using bash and the command line much more often. At this point, the only two things I typically have open are Firefox and a few terminals, plus the occasional PDF reader if I’m annotating something. I’ve found this to be marginally quicker than using file explorers and other applications, but also much more satisfying. That’s for most of the same reasons I’ve come to enjoy scripting - this sense of freedom in personal control.

Updated: