What is Zsh and why you should use it instead of Bash

Zsh (short for Z-Shell) is, in technical terms, a UNIX Command Interpreter (often nicknamed shell by the community) , and in more simple terms a command prompt for UNIX and Linux based computer systems.

Yesterday, Software Engineer Ali Spittel announced on Twitter that Apple’s macOS Operating System will be changing its default shell to Zsh from Bash.

I have been a long-term Zsh user, and I learned about Zsh from Thoughtbot’s laptop-setup-script on GitHub about a year or two ago. Since Zsh will become “the new normal” for macOS users, I decided to write about what it is and why you should be using it over Bash.

What advantages does Zsh bring over Bash?

Zsh borrows a lot of features and functionality from both Ksh and Csh and brings in the best of both while still adding its own spin on things. A few special features I liked that appear to be unique to Zsh (some of these may require you to enable them in autoload first) are as followed:

  • zmv, a built in tool to do mass renames of files using a pattern. For example, to rename every file in a directory to an HTML file you could run zmv –C '(*)(#q.)' '$1.html'.
  • zcalc, a built in calculator to zsh, it takes the same syntax as most programming languages to evaluate and return the result of a mathematical expression.
  • Expanded command syntax to allow advanced globbing and recursive searches.
  • Redirects, you can type echo "Hello World" >log.txt >&1 to print “Hello World” to log.txt and to the stdout at the same time. This makes permanently logging output as you see it possible.

Those are cool but that’s just scratching the surface, the following sections look at the bigger parts of Zsh.

A powerful plugin ecosystem

Another feature of Zsh that I am a fan of is the powerful plugin ecosystem that comes along with it. In the screenshot at the beginning of this post, I am using a terminal emulator called Hyper Terminal to run Zsh which shows the Git plugin in action as it displays a project’s branch information.

You can see a nice list of plugins on GitHub with the awesome-zsh-plugins list. If you want a suggestion of something to try out as you learn how to use zsh plugins, and you’re a frequent reader of my blog, please consider the crystal-zsh plugin. It adds a lot of useful aliases for Crystal Developers like me and saves on typing time.

Several community maintained frameworks

Getting started with Zsh can be complicated, so can all of the necessary installs of plugins and themes. The community has recognized this and created several frameworks to help you get started faster with Zsh. For a beginner I recommend oh-my-zsh, it’s easy to get started with and has quite the community behind it. It’s very well documented and was easy to setup and configure.

Sweet, sweet, shell themes

Zsh supports themes, just like you can install a custom theme in your text editor, Zsh itself has themes to make your shell look as appealing as possible while you work with it. I recommend taking a look at oh-my-zsh’s themes page on their Wiki if you want to get started with choosing a theme. You can even have Zsh pick a theme at random (out of your installed themes) each time you open a new session.

No long-lived command history

Unlike Bash which is built-into the current version of macOS, Zsh does not have a long-lived command history (I’m not a fan of command histories because of privacy concerns and when I used Bash, I had to clear mine frequently and it was annoying to say the least). While it’s still possible that Apple will make changes to their builds of Zsh to enable the command history by default (which has a setting to disable it), a reason I’m a fan of Zsh is that the command history is turned off by default.

The build of Zsh that I run is from the Homebrew package manager and it lacks the command history (at least by default) and that has built some trust between Zsh and I. The builds on Homebrew do not include Apple’s changes to Zsh so the history concern isn’t an issue for me yet. In the future it could be but not today.

It’s worth noting here that when installing the framework oh-my-zsh, it’s worth noting a command history will be enabled. Remember to adjust your zsh configuration accordingly if this is an issue for you.

This is a smaller behavior of Zsh, and while some people may dislike it, the feature aligns with my personal values and is yet another reason I choose to support Zsh. The behavior shows Zsh respects user choice and takes an opt-in approach before collecting any data about the user’s activity (even locally).

Who maintains it and develops new features?

There is a Zsh mailing list where those who wish to can discuss changes to the software and contribute patches. The community development of Zsh is one of the things I like most about it. There’s no right or wrong way to submit a patch, although if you submit to the mailing list you’ll probably receive some criticism and people are free to choose whether or not to include your changes in their builds of Zsh.

Where can I learn more before I install Zsh?

I would recommend reading over awesome-zsh (not to be confused with awesome-zsh-plugins) on GitHub, it provides even more information on why Zsh is awesome. If you have a more specific question about Zsh, search engines are your friend. Ali’s article was also a good read and showed some cool things you can do.

Conclusion

This post required a lot of research, reading, and experimenting to learn about Zsh’s features and unique tools (a few I didn’t know about until the time of writing). I don’t hate Bash and I still use it on quite a few production servers of mine, but Zsh continues to impress me for my local development environment. I hope I’ve convinced you to give Zsh a try.

Crystal Lang: Mapping JSON files in under a second

Crystal Language Logo
Logo of the Crystal Language

Crystal’s performance is useful in IO intense tasks such as mapping JSON files into an object. Recently I worked on a project with my friend David Colombo. David needed to take JSON files, map the data onto an object, and insert it into a SQLite Database. This post describes mapping JSON files in Crystal.

Background

Previously, David had created a parser in NodeJS that reads through the JSON files containing structured data and inserts the data into an SQLite database. These JSON Files contained 1,000s of keys, and are 100s of megabytes each. Unfortunately due to limitations of the language and some anti-practices that JavaScript allows, the script took 14 hours to complete it’s task. Mapping JSON files this way is too slow. David attempted to write a parser in Python. The parser was limited to 9999 keys and couldn’t meet the project’s requirements. He sent me a message asking if I was still working with Ruby and asked if it’d be any faster. I said that while I’m able to write a script in Ruby just fine, I had been experimenting with a language called Crystal (a language with Ruby-like syntax and C-like performance) and asked if he’d be open to trying it instead of Ruby. I also wasn’t able to predict the performance of Ruby ahead of time and was not prepared to provide an answer on whether Ruby would be faster. Since I’ve been learning Crystal I decided to give the rewrite in it a shot.

A few requirements

We want the code to be long lived and require minimal changes to the project’s dependencies. We also wanted to avoid using third party libraries, only using Crystal’s standard library was a goal. The standard library does not support databases. We decided to use the official shard for SQLite. It’s maintained by the Crystal Core Team so it will probably be well maintained.

An object to represent the JSON Files

To maximize performance I decided to use a nested struct to contain the data using Crystal’s JSON.mapping() (https://crystal-lang.org/api/0.28.0/JSON.html#mapping). The data won’t be changed at runtime. Data is only copied into SQLite. Stack memory is cheaper than heap memory so the drawbacks of structs were worth it. In exchange we gained performance benefits while mapping JSON files.

module MyProgram
  extend self
  struct CVE_Data_Entity
    struct CVE_Items
      struct CVE
        struct DataMeta
          JSON.mapping(
            "id": {key: "ID", type: String, nilable: true},
            "assigner": {key: "ASSIGNER", type: String, nilable: true}
          )
        end
        struct Affects
          struct Vendor
            struct VendorData
              struct Product
                struct Data
                  struct Version
                    struct Data
                      JSON.mapping(
                        "version_value": {type: String, nilable: true},
                        "version_affected": {type: String, nilable: true},
                      )
                    end
                    JSON.mapping(
                      "version_data": {type: Array(Data), nilable: true},
                    )
                  end
                  JSON.mapping(
                    "product_name": {type: String, nilable: true},
                    "version": {type: Version, nilable: true}
                  )
                end
                JSON.mapping(
                  "product_data": {type: Array(Data), nilable: true},
                )
              end
              JSON.mapping(
                "vendor_name": {type: String, nilable: true},
                "product": {type: Product, nilable: true},
              )
            end
            JSON.mapping(
              "vendor_data": {type: Array(VendorData), nilable: true},
            )
          end
          JSON.mapping(
            "vendor": {type: Vendor, nilable: true},
          )
        end
        struct Problemtype
          struct Data
            struct Description
              JSON.mapping(
                "lang": {type: String, nilable: true},
                "value": {type: String, nilable: true}
              )
            end
            JSON.mapping(
              "description": {type: Array(Description), nilable: true}
            )
          end
          JSON.mapping(
            "problemtype_data": {type: Array(Data), nilable: true},
          )
        end
        struct References
          struct Data
            JSON.mapping(
              "url": {type: String, nilable: true},
              "name": {type: String, nilable: true},
              "refsource": {type: String, nilable: true},
              "tags": {type: Array(String), nilable: true},
            )
          end
          JSON.mapping(
            "reference_data": {type: Array(Data), nilable: true},
          )
        end
        struct Description
          struct Data
            JSON.mapping(
              "lang": {type: String, nilable: true},
              "value": {type: String, nilable: true},
            )
          end
          JSON.mapping(
            "description_data": {type: Array(Data), nilable: true}
          )
        end
        JSON.mapping(
          "data_type": {type: String, nilable: true},
          "data_format": {type: String, nilable: true},
          "data_version": {type: String, nilable: true},
          "cve_data_meta": {key: "CVE_data_meta", type: DataMeta, nilable: true},
          "affects": {type: Affects, nilable: true},
          "problemtype": {type: Problemtype, nilable: true},
          "references": {type: References, nilable: true},
          "description": {type: Description, nilable: true},
        )
      end
      struct Configurations
        struct Nodes
          struct CPE
            JSON.mapping(
              "vulnerable": {type: Bool, nilable: true},
              "cpe23Uri": {type: String, nilable: true},
            )
          end
          JSON.mapping(
            "operator": {type: String, nilable: true},
            "cpe_match": {type: Array(CPE), nilable: true},
          )
        end
        JSON.mapping(
          "cve_data_version": {key: "CVE_data_version", type: String, nilable: true},
          "nodes": {type: Array(Nodes), nilable: true},
        )
      end
      struct Impact
        struct BaseMetricV3
          struct CvssV3
            JSON.mapping(
              "version": {type: String, nilable: true},
              "vectorString": {type: String, nilable: true},
              "attackVector": {type: String, nilable: true},
              "attackComplexity": {type: String, nilable: true},
              "privilegesRequired": {type: String, nilable: true},
              "userInteraction": {type: String, nilable: true},
              "scope": {type: String, nilable: true},
              "confidentialityImpact": {type: String, nilable: true},
              "integrityImpact": {type: String, nilable: true},
              "availabilityImpact": {type: String, nilable: true},
              "baseScore": {type: Float64, nilable: true},
              "baseSeverity": {type: String, nilable: true},
            )
          end
          JSON.mapping(
            "cvssV3": {type: CvssV3, nilable: true},
            "exploitabilityScore": {type: Float64, nilable: true},
            "impactScore": {type: Float64, nilable: true},
          )
        end
        struct BaseMetricV2
          struct CvssV2
            JSON.mapping(
              "version": {type: String, nilable: true},
              "vectorString": {type: String, nilable: true},
              "accessVector": {type: String, nilable: true},
              "accessComplexity": {type: String, nilable: true},
              "authentication": {type: String, nilable: true},
              "confidentialityImpact": {type: String, nilable: true},
              "integrityImpact": {type: String, nilable: true},
              "availabilityImpact": {type: String, nilable: true},
              "baseScore": {type: Float64, nilable: true},
            )
          end
          JSON.mapping(
            "cvssV2": {type: CvssV2, nilable: true},
            "severity": {type: String, nilable: true},
            "exploitabilityScore": {type: Float64, nilable: true},
            "impactScore": {type: Float64, nilable: true},
            "acInsufInfo": {type: Bool, nilable: true},
            "obtainAllPrivilege": {type: Bool, nilable: true},
            "obtainUserPrivilege": {type: Bool, nilable: true},
            "obtainOtherPrivilege": {type: Bool, nilable: true},
            "userInteractionRequired": {type: Bool, nilable: true},
          )
        end
        JSON.mapping(
          "baseMetricV2": {type: BaseMetricV2, nilable: true},
          "baseMetricV3": {type: BaseMetricV3, nilable: true},
        )
      end
      JSON.mapping(
        "cve": {type: CVE, nilable: true},
        "configurations": {type: Configurations, nilable: true},
        "impact": {type: Impact, nilable: true},
        "publishedDate": {type: String, nilable: true},
        "lastModifiedDate": {type: String, nilable: true},
      )
    end
    JSON.mapping(
      "cve_data_type": {key: "CVE_data_type", type: String, nilable: true},
      "cve_data_format": {key: "CVE_data_format", type: String, nilable: true},
      "cve_data_version": {key: "CVE_data_version", type: String, nilable: true},
      "cve_data_numberofcves": {key: "CVE_data_numberOfCVEs", type: String, nilable: true},
      "cve_data_timestamp": {key: "CVE_data_timestamp", type: String, nilable: true},
      "cve_items": {key: "CVE_Items", type: Array(CVE_Items), nilable: true},
    )
  end
end

Improving the JSON parsing time

In Crystal’s development mode, mapping JSON files (2018.json containing around 200MB of data) into an object took 30 seconds. In release mode it took 5 seconds. This performance was pretty good already but I’d like it to be faster as the datasets would get larger over time. The first thing I tried was changing the class to a struct (which is being used now). That had minimal impact on performance. Next I changed from using the File.open() method to the File.read() method which improved the file read speed and brought the parse time down to under a second. From this we learned that it’s much faster to open a file in read mode, than in read and write mode. When writing code we know to only ask for read permissions except when we also need to write to it. There are probably more file optimizations we could try, although that’s a topic of it’s own.

Inserting the data into SQLite

Mapping JSON files onto an object was only half the challenge. I also needed an efficient way to insert data into a SQLite Database. In my first attempt, I tried iterating over the various arrays. This involved several individual queries and was slow. On my Macbook it took around five minutes. To make matters worse it never finished on a Linux Laptop. After asking for help, I learned that I could group these queries into one bulk transaction. I came up with the following code that ran in under a second.

require "json"
require "sqlite3"
require "./cve_data_entity.cr"
module MyProgram
  VERSION = "0.1.0"
  filepath = "./src/example-json-files/example-full-2019-dataset.json"
  myobject = CVE_Data_Entity.from_json(File.read(filepath))
  DB.open "sqlite3://./src/example-json-files/dbname.sqlt" do |db|
    db.transaction do |tx|
      tx.begin_transaction
      myobject.try(&.cve_items).try(&.each do |item|
        # Insert General Information into the Database
        cve_item_id = item.try(&.cve).try(&.cve_data_meta).try(&.id) || "Not available"
        data_type = item.try(&.cve).try(&.data_type) || "Not available"
        data_format = item.try(&.cve).try(&.data_format) || "Not available"
        data_version = item.try(&.cve).try(&.data_version) || "Not available"
        published_date = item.try(&.publishedDate) || "Not available"
        last_modified_date = item.try(&.lastModifiedDate) || "Not available"
        tx.connection.exec("INSERT INTO GENERAL_INFORMATION (\"ID\", \"DATA_TYPE\", \"DATA_FORMAT\", \"DATA_VERSION\", \"PUBLISHDATE\", \"LASTMODIFIEDDATE\") VALUES (?, ?, ?, ?, ?, ?)", [cve_item_id, data_type, data_format, data_version, published_date, last_modified_date])
      end)
      tx.commit
    end
  end
end

Admittedly the .try() method calls can be a bit messy and we’re looking into cleaner ways to write this type of code. One recommendation was to port the JSON .dig() method to my struct, in the future I might attempt that. Read about capturing blocks and procs in the Crystal Documentation if you’re having trouble reading the above code. This method is not ideal when working with larger amounts of code. Although other than the readability issues, there was not a noticeable performance impact.

Conclusion

By writing a parser and database insertion logic in Crystal we were forced to use better coding practices. We learned about SQLite transactions, and saved about 14 hours on our database’s build time. The database now builds in under a second. If you have a similar challenge in your organization, consider trying to solve it using Crystal. Consider sharing your story in a comment below.