How to read encrypted Google Chrome cookies in C#

A web browser with several tabs and icons is displayed.

Recently at work I needed to write a few bots/scrapers for websites that do not have an official API or bot support. Emulating browser-based logins without triggering anti-bot checks is challenging, to get around this issue, we login from a web browser on the Windows Server and copy its cookies from the SQLite Database storing them. This blog post explains how to read encrypted Google Chrome cookies in C# programs.

Reading cookies from Google Chrome (or other web browsers installed on the system) is controversial in some programming communities given the risk of this knowledge being used in malware. As a result some communities are not inclined to provide an answer to this question on ethical grounds. I see the knowledge as a tool and how you use it is your decision.

System Requirements

This blog post is written assuming you have Google Chrome (or a fork such as Brave Browser) installed built on Chromium 80 or newer. Additionally it’s written for Windows 10 users as the Windows Data Protection API is used to protect cookies. While it’s written with .NET core in mind, you probably won’t be able to run this code on macOS or Linux. I’ve only tested this code on Windows 10 and Windows Server 2019 and significant changes will probably be needed to use it on those platforms. I did most of my testing on Brave (then switched the paths to Google Chrome on the server).

How cookies were encrypted in Chrome version 79 and lower

Prior to the release of Google Chrome version 80, the software relied directly on the Windows Data Protection API to encrypt and decrypt the value of cookies. Any time you needed to encrypt or decrypt a cookie, you would pass the value to the Windows Data Protection API and await its response.

The encryption is designed to prevent other users on the same computer from copying your cookies and using them to access your online accounts. Your Windows password and some other local data is used to derive a key for use with the Windows Data Protection API (DPAPI) and without your Windows password only a local administrator could access data protected with DPAPI.

You were able to use the following code snippet to decrypt a cookie in Chrome 79 and lower. You’ll of course need to fetch it from SQLite although that’s outside the scope of this blog post.

using System.Security.Cryptography;
...
ProtectedData.Unprotect(cookie.EncryptedValue, null, DataProtectionScope.CurrentUser);
...

How Google Chrome version 80 changes the cookie encryption process

According to Arun on StackOverflow: “Starting Chrome 80 version, cookies are encrypted using the AES256-GCM algorithm, and the AES encryption key is encrypted with the DPAPI encryption system, and the encrypted key is stored inside the ‘Local State’ file.”.

This means that passing a cookie to DPAPI directly will not work anymore. Instead only the encryption key is encrypted using DPAPI. To decrypt a cookie’s encrypted value you will need to get the encryption key from the ‘Local State’ file, decrypt it with DPAPI, and then use other tools to run AES256-GCM decryption. These changes were made to improve the security of the Chromium platform although are breaking to many third party tools that rely on data from Chromium databases.

C# has a thriving package ecosystem and finding packages to do this for me was an easy process. Your resulting code should look something like the following…

using System;
using System.Collections.Generic;
using System.IO;
using System.Text;
using System.Linq;
using Microsoft.EntityFrameworkCore;
using System.Security.Cryptography;
using Newtonsoft.Json.Linq;
using Org.BouncyCastle.Crypto;
using Org.BouncyCastle.Crypto.Engines;
using Org.BouncyCastle.Crypto.Modes;
using Org.BouncyCastle.Crypto.Parameters;

namespace BraveBrowserCookieReaderDemo
{
    public class BraveCookieReader
    {
        public IEnumerable<Tuple<string, string>> ReadCookies(string hostName)
        {
            if (hostName == null) throw new ArgumentNullException("hostName");

            using var context = new BraveCookieDbContext();

            var cookies = context
                .Cookies
                .Where(c => c.HostKey.Equals(hostName))
                .AsNoTracking();

            // Big thanks to https://stackoverflow.com/a/60611673/6481581 for answering how Chrome 80 and up changed the way cookies are encrypted.

            string encKey = File.ReadAllText(System.Environment.GetEnvironmentVariable("LOCALAPPDATA") + @"\BraveSoftware\Brave-Browser\User Data\Local State");
            encKey = JObject.Parse(encKey)["os_crypt"]["encrypted_key"].ToString();
            var decodedKey = System.Security.Cryptography.ProtectedData.Unprotect(Convert.FromBase64String(encKey).Skip(5).ToArray(), null, System.Security.Cryptography.DataProtectionScope.LocalMachine);

            foreach (var cookie in cookies)
            {

                var data = cookie.EncryptedValue;

                var decodedValue = _decryptWithKey(data, decodedKey, 3);


                yield return Tuple.Create(cookie.Name, decodedValue);
            }
        }


        private string _decryptWithKey(byte[] message, byte[] key, int nonSecretPayloadLength)
        {
            const int KEY_BIT_SIZE = 256;
            const int MAC_BIT_SIZE = 128;
            const int NONCE_BIT_SIZE = 96;

            if (key == null || key.Length != KEY_BIT_SIZE / 8)
                throw new ArgumentException(String.Format("Key needs to be {0} bit!", KEY_BIT_SIZE), "key");
            if (message == null || message.Length == 0)
                throw new ArgumentException("Message required!", "message");

            using (var cipherStream = new MemoryStream(message))
            using (var cipherReader = new BinaryReader(cipherStream))
            {
                var nonSecretPayload = cipherReader.ReadBytes(nonSecretPayloadLength);
                var nonce = cipherReader.ReadBytes(NONCE_BIT_SIZE / 8);
                var cipher = new GcmBlockCipher(new AesEngine());
                var parameters = new AeadParameters(new KeyParameter(key), MAC_BIT_SIZE, nonce);
                cipher.Init(false, parameters);
                var cipherText = cipherReader.ReadBytes(message.Length);
                var plainText = new byte[cipher.GetOutputSize(cipherText.Length)];
                try
                {
                    var len = cipher.ProcessBytes(cipherText, 0, cipherText.Length, plainText, 0);
                    cipher.DoFinal(plainText, len);
                }
                catch (InvalidCipherTextException)
                {
                    return null;
                }
                return Encoding.Default.GetString(plainText);
            }
        }
    }
}

Solution

You can access my full & final solution on GitHub (@irlcatgirl/BraveCookieReaderDemo) where I used the techniques discussed in this post to write a full application which reads cookies and their encrypted values from Brave Browser (a privacy friendly fork of Google Chrome). It includes the unexplained things (such as using EF Core to access Google Chrome’s SQLite database and how to create a temp copy). I hope you found this post informative and helpful.

References

This blog post would not of been possible without help from the following resources and individuals.

Easily paginating your EntityFramework Core Queries with C# Generics

Recently I had the challenge of paginating a web application I wrote as the tables displaying data were getting quite long and I needed a way to display things more cleanly. This post details how I solved the problem using C# Generics. It includes plenty of code snippets so you can follow along in your own application.

The code shown in this post was written for my client Universal Layer and is released by them under a BSD 3-Clause License.

What is pagination?

Pagination is a process of taking a collection of objects and putting them into pages. For example you might have a book which contains hundreds of pages but only fifty words per page. You cannot put an entire book on a single piece of paper and you should not attempt to do the same in computer software. Rather you should put a set amount of words on each page, create a list of the pages, and have a way to easily switch between pages. In the human world this works by binding the pages of a book together, pagination in computer software works similarly by binding the pages of data together into an easy to use object.

How to paginate in computer software

Pagination requires three pieces of information: You need a collection of data to paginate, the number of results per page (a limiter), and to get the correct data you need to request a specific page. Object-oriented languages such as C# make this easy. You can pass the data to an object’s constructor and have it run the calculations on your behalf and make the results available as read only properties of the object.

The properties generated by the constructor are as follows:

  • Item Count: Number of items in our collection. When passing an IQueryable this means the number of rows in a database table.
  • Page Count: When you divide the number of items by the number of results per page you get the Page Count or number of pages.
  • Skip: Number of items to skip in our SQL query
  • Take: Number of items to select in our SQL query
  • The page of the selected results
  • Number of First Page
  • Number of Last Page
  • Number of Next Page
  • Number of Current Page
  • Number of Previous Page

Security Considerations for User Configurable Pagination

Some developers may allow users to choose the number of results per page in a table, api, etc. Be sure to set a reasonable maximum number of results per page and enforce this on your backend code. Failure to add a safe maximum limit could result in large queries that overwhelm the database server and result in a denial of service vulnerability.

Solution

I decided that the best way to go about solving this problem was to pass the necessary data to the constructor and have the constructor do all of the math and fill in prosperities.

From there getting data from the object’s properties is much easier than calculating on each controller and results in shorter code.

The final result of my efforts was a generic class, the code for it is below this paragraph.

PagedResults.cs

// This code is released by Universal Layer under a BSD 3-Clause License
// https://github.com/ulayer/PagedResults.cs
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;

namespace Your.Application.Models
{
    public class PagedResults<T>
    {
        public int ItemCount { get; }
        public int PageCount { get; }
        
        public int Skip { get; }
        public int Take { get; }
        
        public IEnumerable<T> PageOfResults { get; }
        
        public int FirstPage { get; }
        public int LastPage { get; }
        
        public int NextPage { get; }
        public int CurrentPage { get; }
        public int PreviousPage { get; }

        public PagedResults(IQueryable<T> results, int pageNumber, int resultsPerPage)
        {
            ItemCount = results.Count();
            PageCount = (int) Math.Ceiling((double) ItemCount / resultsPerPage);
            
            Skip = (pageNumber - 1) * resultsPerPage;
            Take = resultsPerPage;
            
            PageOfResults = results.Skip(Skip).Take(Take).ToList();
            
            FirstPage = 1;
            LastPage = LastPage = PageCount == 0 ? 1 : PageCount;
            
            NextPage = Math.Min(pageNumber + 1, LastPage);
            CurrentPage = pageNumber;
            PreviousPage = Math.Max(pageNumber - 1, FirstPage);
        }
    }
}

I also worked on a few examples for this post of how you could take advantage of this class. I hope you find them useful when integrating this class into your applications.

Within a service

It is a well accepted design pattern to call EF Core from within a service. It’s still possible to get an IQueryable by injecting EF Core’s Context into your page but you shouldn’t do this when services exist. By asking for data from a database through a service you keep your Razorpages code-behind area more organized. If you have to call several methods on your EF Core Context to get back the desired data it does not clutter your code-behind area.

Where initially we created a generic class we can now use it as a PagedResults<Customer> class to paginate data from the Customer class. The Customer class can come from anywhere, the important thing is that the collection of Customers is passed to our PagedResults<Customer> class as an IQueryable<T> (EF Core does this for you). The helpful thing about using a generic class is that we can change the type as we need paged results for new data types without having to code an additional PagedResults class to handle that new data type.

public PagedResults<Customer> GetPagedResults(int pageNumber, int resultsPerPage)
        {
            return new PagedResults<Customer>
            (_Context.Customers,
                pageNumber,
                resultsPerPage);
        }

Passing the results to a RazorPages Partial View

I came up with the following partial view for use in Razorpages. It requires that the Model of the calling Razorpages have a property called PagedResults. The partial views calls the model for data dynamically so if the data doesn’t exist in the expected model your program will throw an exception. As long as PagedResults exists in the page’s model you can inject the pagination anywhere in your view as <partial name="Shared/_Pagination" />.

It’s important to note that this is a partial view not a full razorpage. To simplify the file Shared/_Pagination.cshtml exists but not Shared/_Pagination.cshtml.cs.

<nav aria-label="Page navigation example">
    <ul class="pagination">
        @if (@Model.PagedResults.CurrentPage != 1) // Show a link to the first page as well as previous page as long as we are not on the first page.
        {
            <li class="page-item"><a href="./@Model.PagedResults.FirstPage" class="page-link">First</a></li>
            <li class="page-item">
                <a href="./@Model.PagedResults.PreviousPage" class="page-link">
                    <span aria-hidden="true">&laquo;</span>
                    <span class="sr-only">Previous</span>
                </a></li>
        }
        
        @{ var pageCount = @Model.PagedResults.PageCount; }
        
        @for (int i = 1; i <= pageCount && i < 10; i++)
        {
            var currentPage = @Model.PagedResults.CurrentPage;
            if (pageCount > 10)
            {
                var activePage = ((currentPage - 5) + i);
                var active = activePage == currentPage ? "active" : string.Empty;
                if (activePage <= (pageCount - 1) && (activePage > 0))
                {
                    <li class="page-item @active"><a href="./@activePage" class="page-link">@activePage</a></li>
                }
            }
            else
            {
                var active = i == currentPage ? "active" : string.Empty;
                <li class="page-item @active"><a href="./@i" class="page-link">@i</a></li>
            }
        }
        
        
        
        @if (@Model.PagedResults.CurrentPage != @Model.PagedResults.LastPage)
        {
            <li class="page-item"><a href="./@Model.PagedResults.NextPage" class="page-link">
                <span aria-hidden="true">&raquo;</span>
                <span class="sr-only">Next</span>
            </a></li>
            <li class="page-item"><a href="./@Model.PagedResults.LastPage" class="page-link">Last</a></li>
        }
    </ul>
</nav>

References

How to use Tor as your System DNS Resolver

Recently I posted criticism of Mozilla’s new DNS over HTTPS feature given they disabled they primary security functionality of it. The user isn’t even warned and can be secretly spied on. This blog post details how to use Tor as your System DNS resolver and has instructions for each operating system plus instructions for disabling Firefox’s dangerous DNS over HTTPS implementation. If you’d like to read why Firefox’s implementation of DNS over HTTPS is harmful, you may read my previous blog post.

Note for Firefox Users

By default Mozilla has DNS over HTTPS enabled on networks that do not request the feature to be disabled. Visit about:config and set network.trr.mode to 5 to completely turn off the feature. I do not trust Mozilla’s implementation and you shouldn’t either.

Why not use Tor Browser?

Where possible you should download Tor Browser and use it instead. Unfortunately, many websites block the Tor network or show them a large number of CAPTCHAs (imagine having to check “I’m not a robot” every few minutes, that’s the reality for many Tor Browser users).

This alternative solution at least doesn’t disable DNS Security when network administrators are uncomfortable and website owners can still see your real IP Address reducing the amount of CAPTCHAs you will see as a result of using this feature. I will emphasize that it is not as private as the Tor Browser Bundle, please keep this in mind if you use this approach.

How to use Tor as your System DNS Resolver on Windows 10

At this time the tooling available on Windows 10 is not in a state where I’m comfortable writing steps out for as I am unsure on several of the security implications. As a temporary workaround I would recommend buying a Raspberry Pi, setting up Linux and a DNS resolver on it and following the steps below for using Tor on Linxu.

How to use Tor as your System DNS Resolver on macOS

Step 0) Install the Homebrew Package Manager

Open the terminal app on macOS and run the following command /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" follow the prompts and let the package manager install itself. This may take a few minutes to download and configure everything as Homebrew relies on Xcode developer tools which can be quite large.

Step 1) Install the Tor and DNSMasq Homebrew Packages

To get started you will need to run the following two commands: brew install tor and brew install dnsmasq. This will install special packages for Tor and DNSMasq (a special DNS Proxy)

Step 2) Enable Tor’s DNS Resolver

Open /usr/local/etc/torrc with a text editor of your choice. I recommend running nano as root to avoid any permission issues. So run sudo nano /usr/local/etc/torrc and add the line DNSPort 9053 to the bottom. Then run brew services restart tor to restart the Tor service and reload the configuration. This will also make sure the resolver is enabled.

Step 3) Configure DNSMasq

You will need to configure DNSMasq to send your DNS Queries to the Tor DNS Resolver as it runs on a non-standard port. To do this run nano /usr/local/etc/dnsmasq/dnsmasq.conf and add the following lines to the bottom of the file. no-resolv to disable fetching DNS Servers from /etc/resolv.conf and /etc/hosts and server=127.0.0.1#9053. Save the file and run sudo brew services restart dnsmasq (since dnsmasq runs on a privileged port (a port below 1024), it must be run as root or a user with special permissions, this is the standard configuration for dnsmasq on macOS Systems).

How to use Tor as your System DNS Resolver on Linux

Configuring Tor as your System DNS Resolver on Linux is a bit complex. These instructions only have Debian and Ubuntu in mind. If you use a different Linux distribution you’ll need to do your own research to get things working.

Install Tor

For security reasons you should always download Tor from the official repositories. The version that Ubuntu/Debian apt repos have is outdated at best. To install and configure Tor please run the following commands:

  • Finally Open /etc/torrc with a text editor of your choice. I recommend running nano as root to avoid any permission issues. So run sudo nano /etc/torrc and add the line DNSPort 9053 to the bottom. Then run sudo service tor restart to restart the Tor service and reload the configuration. This will also make sure the resolver is enabled.

Install dnsmasq to accept requests and forward them to the Tor DNS Resolver

You will need to configure DNSMasq to send your DNS Queries to the Tor DNS Resolver as it runs on a non-standard port. To do this run nano /etc/dnsmasq/dnsmasq.conf and add the following lines to the bottom of the file. no-resolv to disable fetching DNS Servers from /etc/resolv.conf and /etc/hosts and server=127.0.0.1#9053. Save the file and run sudo service dnsmasq restart. I recommend binding to sepcific interfaces and using the IP Address “127.0.0.54 to avoid conflicts with other services running on your machine.

Remove systemd-resolved and have network manager use dnsmasq instead

Newer versions of Ubuntu have integrated systemd-resolved a built in caching DNS Resolver into systemd. This can cause problems with our DNS setup so it’s best to disable it where possible. These instructions are adapted from an answer on AskUbuntu. I’ve tested them on my personal computer but didn’t write/research them. Be aware that this will break some corporate VPN clients (see LaunchPad issue).

  • Run sudo systemctl disable systemd-resolved and sudo systemctl stop systemd-resolved in a terminal.
  • Next run sudo nano /etc/NetworkManager/NetworkManager.conf and add the following line after the [main] section: dns=default.
  • Run rm /etc/resolv.conf and then sudo systemctl restart NetworkManager. Don’t worry as this will create a new resolv.conf file.

Final Steps

Be sure to go in network settings and set your DNS Resolver to 127.0.0.54 and then things will work as expected.

How to automate your own backups with rclone and crontab on any Unix/Linux based computer

Data is backed up to a tape

I’ve been migrating away from Google’s cloud based software. I have concerns related to the security of my data as well as want access to my documents when the Google Cloud or my internet connection is having issues. I was able to download all of my data from Google Drive easily although this creates a new problem. I’m now responsible again for my own backups. Without a backups solution you risk losing your important documents. This post discusses how I created backups with rclone and how you can do the same.

This tutorial is written with Unix/Linux based computers only in mind. You might be able to get this working on Windows if you do your own research and experiments. This tutorial is not intended for Windows users and I cannot help them once something goes wrong.

Where to store the backups

I performed considerable research as to where to store my backups and decided to choose Backblaze. They provide 10GB of storage for free and then charge just $0.005/GB/month of storage. With a provider choose I installed a free and open source program called rclone. rclone works like rsync except for cloud storage providers. I was able to get started with it in just 15 minutes.

Create a Backblaze account and bucket for use with backups with rclone

To create a Backblaze account visit their signup page. You will need to answer a few questions about yourself and provide an email address, phone number, and a credit card for billing purposes. Once your account is set up and verified, visit the my account area. You’ll need to click create a bucket and choose a name for it. Make sure that the privacy setting is set to “private” or anyone who guesses the bucket name will be able to list and download your documents without needing a valid app key with access to your account. Once your bucket is created visit the App Keys section and create a key with access to your bucket. Write down the key id and secret, you’ll need it later when setting up rclone.

Install and Configure rclone to use your bucket

Visit rclone’s installation page (or on Debian Linux type sudo apt install rclone in a terminal window to save some time) and follow the instructions. Once it’s installed open a terminal. In the terminal run rclone config. It will ask you several questions, I choose the name b2 and then follow the prompts. I would recommend that you DO NOT enable the HARD DELETE option and allow your bucket to keep all object versions. It’s a bit more expensive as you’ll store multiple copies of all documents but is useful in the event you delete a document by mistake and the changes were to sync in error.

Configure crontab for automated backups with rclone

In your terminal type crontab -e and then view the bottom of the crontab file. Type in the string @hourly rclone sync /home/name/Documents/ b2:name/ --verbose replacing the folder path with your documents folder path and the bucket name with your bucket name. Most likely the crontab file will open in nano. You can read how to use the text editor nano on the Gentoo Linux Wiki. If it opens a different text editor then Google is your friend. It will help you find the information you need to use that text editor. Afterward save and close the file and your automatic backups will be in effect. You might want to run your first backup manually. Do this to make sure everything gets synced properly without any issue. Check your backups on a regular basis and ensure that they work as expected. Don’t wait until a data loss incident to find out if your backups work.

Conclusion

You can reduce costs by switching away from G Suite to your own backups solution. With a bit of work on the command line, you can roll your own backups with rclone solution in under an hour.

How to use Wasabi Object Storage with Mastodon’s Amazon S3 Adapter

Wasabi is a cost effective alternative to Amazon S3. With it you can use existing tools built for Amazon S3 at low cost. Wasabi is great for instance owners like me who don’t have a small fortune to spent on cloud services. This blog post discusses the configuration I used for LGBTQIA.is using Mastodon v3.0.1 Catgirl Edition (the S3 Adapter is to the best of my knowledge identical to that of Vanilla Mastodon) to use Wasabi Object Storage.

Configuring Mastodon

I used the following Mastodon .env configuration to make Wasabi work. This configuration should go in .env.production. You’ll need to set your own S3_BUCKET, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and S3_CLOUDFRONT_HOST data. You’ll notice the S3_REGION is us-east-1 while the S3_ENDPOINT is https://s3.us-east-2.wasabisys.com/. This is intentional and not a typo. It’s explained further at the end of the article.

S3_ENABLED=true
S3_BUCKET=YOUR_BUCKET_NAME
S3_REGION=us-east-1
AWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY
AWS_SECRET_ACCESS_KEY=YOUR_SECRET_KEY
S3_PROTOCOL=https
S3_HOSTNAME=s3.wasabisys.com
S3_CLOUDFRONT_HOST=media.your.hostname
S3_ENDPOINT=https://s3.us-east-2.wasabisys.com/

Configuring Nginx

I choose to proxy my media traffic through nginx. This allows for you to change the object storage provider should Wasabi ever cease operations or if you become unhappy with their pricing model without affecting remote instances who would have linked to the old URLs at Wasabi. It is also nice if you want to use Cloudflare’s Free CDN to proxy image traffic without proxying all Mastodon traffic. Remember that you need to update server_name and your_bucket_name for this to work correctly.

server {
        listen 443 ssl http2;
        listen [::]:443 ssl http2;
        server_name media.your.hostname;

        ssl_certificate /etc/nginx/ssl/cert.pem;
        ssl_certificate_key /etc/nginx/ssl/privkey.pem;
	set $backend "https://s3.us-east-2.wasabisys.com:443";

        location / {
		resolver 1.1.1.1;
                proxy_cache mastodon_media;
                proxy_cache_revalidate on;
                proxy_buffering on;
                proxy_cache_use_stale error timeout updating http_500 http_502 http_503 http_504;
                proxy_cache_background_update on;
                proxy_cache_lock on;
                proxy_cache_valid 1d;
                proxy_cache_valid 404 1h;
                proxy_ignore_headers Cache-Control;
		proxy_set_header Host 's3.us-east-2.wasabisys.com';
                add_header X-Cached $upstream_cache_status;
		proxy_pass $backend/your_bucket_name$uri;
        }

}

Known Issues with Wasabi’s S3 Implementation

When using the us-east-2 region you must still set us-east-1 as your S3_REGION while using the https://s3.us-east-2.wasabisys.com/ endpoint. Authentication fails otherwise and you’ll be unable to upload objects. This is confusing behavior and I hope it’s fixed in the future. My only other complaint is they tend to have more frequent outages than Amazon S3 but it’s understandable at the lower price point.

Conclusion

Wasabi provides a low cost alternative to Amazon S3 at the cost of a confusing configuration. If you’ve followed this tutorial correctly your Mastodon instance will now use Wasabi. If this isn’t a new instance be sure to move your existing media files over with rclone.

What is Zsh and why you should use it instead of Bash

Zsh (short for Z-Shell) is, in technical terms, a UNIX Command Interpreter (often nicknamed shell by the community) , and in more simple terms a command prompt for UNIX and Linux based computer systems.

Yesterday, Software Engineer Ali Spittel announced on Twitter that Apple’s macOS Operating System will be changing its default shell to Zsh from Bash.

I have been a long-term Zsh user, and I learned about Zsh from Thoughtbot’s laptop-setup-script on GitHub about a year or two ago. Since Zsh will become “the new normal” for macOS users, I decided to write about what it is and why you should be using it over Bash.

What advantages does Zsh bring over Bash?

Zsh borrows a lot of features and functionality from both Ksh and Csh and brings in the best of both while still adding its own spin on things. A few special features I liked that appear to be unique to Zsh (some of these may require you to enable them in autoload first) are as followed:

  • zmv, a built in tool to do mass renames of files using a pattern. For example, to rename every file in a directory to an HTML file you could run zmv –C '(*)(#q.)' '$1.html'.
  • zcalc, a built in calculator to zsh, it takes the same syntax as most programming languages to evaluate and return the result of a mathematical expression.
  • Expanded command syntax to allow advanced globbing and recursive searches.
  • Redirects, you can type echo "Hello World" >log.txt >&1 to print “Hello World” to log.txt and to the stdout at the same time. This makes permanently logging output as you see it possible.

Those are cool but that’s just scratching the surface, the following sections look at the bigger parts of Zsh.

A powerful plugin ecosystem

Another feature of Zsh that I am a fan of is the powerful plugin ecosystem that comes along with it. In the screenshot at the beginning of this post, I am using a terminal emulator called Hyper Terminal to run Zsh which shows the Git plugin in action as it displays a project’s branch information.

You can see a nice list of plugins on GitHub with the awesome-zsh-plugins list. If you want a suggestion of something to try out as you learn how to use zsh plugins, and you’re a frequent reader of my blog, please consider the crystal-zsh plugin. It adds a lot of useful aliases for Crystal Developers like me and saves on typing time.

Several community maintained frameworks

Getting started with Zsh can be complicated, so can all of the necessary installs of plugins and themes. The community has recognized this and created several frameworks to help you get started faster with Zsh. For a beginner I recommend oh-my-zsh, it’s easy to get started with and has quite the community behind it. It’s very well documented and was easy to setup and configure.

Sweet, sweet, shell themes

Zsh supports themes, just like you can install a custom theme in your text editor, Zsh itself has themes to make your shell look as appealing as possible while you work with it. I recommend taking a look at oh-my-zsh’s themes page on their Wiki if you want to get started with choosing a theme. You can even have Zsh pick a theme at random (out of your installed themes) each time you open a new session.

No long-lived command history

Unlike Bash which is built-into the current version of macOS, Zsh does not have a long-lived command history (I’m not a fan of command histories because of privacy concerns and when I used Bash, I had to clear mine frequently and it was annoying to say the least). While it’s still possible that Apple will make changes to their builds of Zsh to enable the command history by default (which has a setting to disable it), a reason I’m a fan of Zsh is that the command history is turned off by default.

The build of Zsh that I run is from the Homebrew package manager and it lacks the command history (at least by default) and that has built some trust between Zsh and I. The builds on Homebrew do not include Apple’s changes to Zsh so the history concern isn’t an issue for me yet. In the future it could be but not today.

It’s worth noting here that when installing the framework oh-my-zsh, it’s worth noting a command history will be enabled. Remember to adjust your zsh configuration accordingly if this is an issue for you.

This is a smaller behavior of Zsh, and while some people may dislike it, the feature aligns with my personal values and is yet another reason I choose to support Zsh. The behavior shows Zsh respects user choice and takes an opt-in approach before collecting any data about the user’s activity (even locally).

Who maintains it and develops new features?

There is a Zsh mailing list where those who wish to can discuss changes to the software and contribute patches. The community development of Zsh is one of the things I like most about it. There’s no right or wrong way to submit a patch, although if you submit to the mailing list you’ll probably receive some criticism and people are free to choose whether or not to include your changes in their builds of Zsh.

Where can I learn more before I install Zsh?

I would recommend reading over awesome-zsh (not to be confused with awesome-zsh-plugins) on GitHub, it provides even more information on why Zsh is awesome. If you have a more specific question about Zsh, search engines are your friend. Ali’s article was also a good read and showed some cool things you can do.

Conclusion

This post required a lot of research, reading, and experimenting to learn about Zsh’s features and unique tools (a few I didn’t know about until the time of writing). I don’t hate Bash and I still use it on quite a few production servers of mine, but Zsh continues to impress me for my local development environment. I hope I’ve convinced you to give Zsh a try.

Crystal Lang: Mapping JSON files in under a second

Crystal Language Logo
Logo of the Crystal Language

Crystal’s performance is useful in IO intense tasks such as mapping JSON files into an object. Recently I worked on a project with my friend David Colombo. David needed to take JSON files, map the data onto an object, and insert it into a SQLite Database. This post describes mapping JSON files in Crystal.

Background

Previously, David had created a parser in NodeJS that reads through the JSON files containing structured data and inserts the data into an SQLite database. These JSON Files contained 1,000s of keys, and are 100s of megabytes each. Unfortunately due to limitations of the language and some anti-practices that JavaScript allows, the script took 14 hours to complete it’s task. Mapping JSON files this way is too slow. David attempted to write a parser in Python. The parser was limited to 9999 keys and couldn’t meet the project’s requirements. He sent me a message asking if I was still working with Ruby and asked if it’d be any faster. I said that while I’m able to write a script in Ruby just fine, I had been experimenting with a language called Crystal (a language with Ruby-like syntax and C-like performance) and asked if he’d be open to trying it instead of Ruby. I also wasn’t able to predict the performance of Ruby ahead of time and was not prepared to provide an answer on whether Ruby would be faster. Since I’ve been learning Crystal I decided to give the rewrite in it a shot.

A few requirements

We want the code to be long lived and require minimal changes to the project’s dependencies. We also wanted to avoid using third party libraries, only using Crystal’s standard library was a goal. The standard library does not support databases. We decided to use the official shard for SQLite. It’s maintained by the Crystal Core Team so it will probably be well maintained.

An object to represent the JSON Files

To maximize performance I decided to use a nested struct to contain the data using Crystal’s JSON.mapping() (https://crystal-lang.org/api/0.28.0/JSON.html#mapping). The data won’t be changed at runtime. Data is only copied into SQLite. Stack memory is cheaper than heap memory so the drawbacks of structs were worth it. In exchange we gained performance benefits while mapping JSON files.

module MyProgram
  extend self
  struct CVE_Data_Entity
    struct CVE_Items
      struct CVE
        struct DataMeta
          JSON.mapping(
            "id": {key: "ID", type: String, nilable: true},
            "assigner": {key: "ASSIGNER", type: String, nilable: true}
          )
        end
        struct Affects
          struct Vendor
            struct VendorData
              struct Product
                struct Data
                  struct Version
                    struct Data
                      JSON.mapping(
                        "version_value": {type: String, nilable: true},
                        "version_affected": {type: String, nilable: true},
                      )
                    end
                    JSON.mapping(
                      "version_data": {type: Array(Data), nilable: true},
                    )
                  end
                  JSON.mapping(
                    "product_name": {type: String, nilable: true},
                    "version": {type: Version, nilable: true}
                  )
                end
                JSON.mapping(
                  "product_data": {type: Array(Data), nilable: true},
                )
              end
              JSON.mapping(
                "vendor_name": {type: String, nilable: true},
                "product": {type: Product, nilable: true},
              )
            end
            JSON.mapping(
              "vendor_data": {type: Array(VendorData), nilable: true},
            )
          end
          JSON.mapping(
            "vendor": {type: Vendor, nilable: true},
          )
        end
        struct Problemtype
          struct Data
            struct Description
              JSON.mapping(
                "lang": {type: String, nilable: true},
                "value": {type: String, nilable: true}
              )
            end
            JSON.mapping(
              "description": {type: Array(Description), nilable: true}
            )
          end
          JSON.mapping(
            "problemtype_data": {type: Array(Data), nilable: true},
          )
        end
        struct References
          struct Data
            JSON.mapping(
              "url": {type: String, nilable: true},
              "name": {type: String, nilable: true},
              "refsource": {type: String, nilable: true},
              "tags": {type: Array(String), nilable: true},
            )
          end
          JSON.mapping(
            "reference_data": {type: Array(Data), nilable: true},
          )
        end
        struct Description
          struct Data
            JSON.mapping(
              "lang": {type: String, nilable: true},
              "value": {type: String, nilable: true},
            )
          end
          JSON.mapping(
            "description_data": {type: Array(Data), nilable: true}
          )
        end
        JSON.mapping(
          "data_type": {type: String, nilable: true},
          "data_format": {type: String, nilable: true},
          "data_version": {type: String, nilable: true},
          "cve_data_meta": {key: "CVE_data_meta", type: DataMeta, nilable: true},
          "affects": {type: Affects, nilable: true},
          "problemtype": {type: Problemtype, nilable: true},
          "references": {type: References, nilable: true},
          "description": {type: Description, nilable: true},
        )
      end
      struct Configurations
        struct Nodes
          struct CPE
            JSON.mapping(
              "vulnerable": {type: Bool, nilable: true},
              "cpe23Uri": {type: String, nilable: true},
            )
          end
          JSON.mapping(
            "operator": {type: String, nilable: true},
            "cpe_match": {type: Array(CPE), nilable: true},
          )
        end
        JSON.mapping(
          "cve_data_version": {key: "CVE_data_version", type: String, nilable: true},
          "nodes": {type: Array(Nodes), nilable: true},
        )
      end
      struct Impact
        struct BaseMetricV3
          struct CvssV3
            JSON.mapping(
              "version": {type: String, nilable: true},
              "vectorString": {type: String, nilable: true},
              "attackVector": {type: String, nilable: true},
              "attackComplexity": {type: String, nilable: true},
              "privilegesRequired": {type: String, nilable: true},
              "userInteraction": {type: String, nilable: true},
              "scope": {type: String, nilable: true},
              "confidentialityImpact": {type: String, nilable: true},
              "integrityImpact": {type: String, nilable: true},
              "availabilityImpact": {type: String, nilable: true},
              "baseScore": {type: Float64, nilable: true},
              "baseSeverity": {type: String, nilable: true},
            )
          end
          JSON.mapping(
            "cvssV3": {type: CvssV3, nilable: true},
            "exploitabilityScore": {type: Float64, nilable: true},
            "impactScore": {type: Float64, nilable: true},
          )
        end
        struct BaseMetricV2
          struct CvssV2
            JSON.mapping(
              "version": {type: String, nilable: true},
              "vectorString": {type: String, nilable: true},
              "accessVector": {type: String, nilable: true},
              "accessComplexity": {type: String, nilable: true},
              "authentication": {type: String, nilable: true},
              "confidentialityImpact": {type: String, nilable: true},
              "integrityImpact": {type: String, nilable: true},
              "availabilityImpact": {type: String, nilable: true},
              "baseScore": {type: Float64, nilable: true},
            )
          end
          JSON.mapping(
            "cvssV2": {type: CvssV2, nilable: true},
            "severity": {type: String, nilable: true},
            "exploitabilityScore": {type: Float64, nilable: true},
            "impactScore": {type: Float64, nilable: true},
            "acInsufInfo": {type: Bool, nilable: true},
            "obtainAllPrivilege": {type: Bool, nilable: true},
            "obtainUserPrivilege": {type: Bool, nilable: true},
            "obtainOtherPrivilege": {type: Bool, nilable: true},
            "userInteractionRequired": {type: Bool, nilable: true},
          )
        end
        JSON.mapping(
          "baseMetricV2": {type: BaseMetricV2, nilable: true},
          "baseMetricV3": {type: BaseMetricV3, nilable: true},
        )
      end
      JSON.mapping(
        "cve": {type: CVE, nilable: true},
        "configurations": {type: Configurations, nilable: true},
        "impact": {type: Impact, nilable: true},
        "publishedDate": {type: String, nilable: true},
        "lastModifiedDate": {type: String, nilable: true},
      )
    end
    JSON.mapping(
      "cve_data_type": {key: "CVE_data_type", type: String, nilable: true},
      "cve_data_format": {key: "CVE_data_format", type: String, nilable: true},
      "cve_data_version": {key: "CVE_data_version", type: String, nilable: true},
      "cve_data_numberofcves": {key: "CVE_data_numberOfCVEs", type: String, nilable: true},
      "cve_data_timestamp": {key: "CVE_data_timestamp", type: String, nilable: true},
      "cve_items": {key: "CVE_Items", type: Array(CVE_Items), nilable: true},
    )
  end
end

Improving the JSON parsing time

In Crystal’s development mode, mapping JSON files (2018.json containing around 200MB of data) into an object took 30 seconds. In release mode it took 5 seconds. This performance was pretty good already but I’d like it to be faster as the datasets would get larger over time. The first thing I tried was changing the class to a struct (which is being used now). That had minimal impact on performance. Next I changed from using the File.open() method to the File.read() method which improved the file read speed and brought the parse time down to under a second. From this we learned that it’s much faster to open a file in read mode, than in read and write mode. When writing code we know to only ask for read permissions except when we also need to write to it. There are probably more file optimizations we could try, although that’s a topic of it’s own.

Inserting the data into SQLite

Mapping JSON files onto an object was only half the challenge. I also needed an efficient way to insert data into a SQLite Database. In my first attempt, I tried iterating over the various arrays. This involved several individual queries and was slow. On my Macbook it took around five minutes. To make matters worse it never finished on a Linux Laptop. After asking for help, I learned that I could group these queries into one bulk transaction. I came up with the following code that ran in under a second.

require "json"
require "sqlite3"
require "./cve_data_entity.cr"
module MyProgram
  VERSION = "0.1.0"
  filepath = "./src/example-json-files/example-full-2019-dataset.json"
  myobject = CVE_Data_Entity.from_json(File.read(filepath))
  DB.open "sqlite3://./src/example-json-files/dbname.sqlt" do |db|
    db.transaction do |tx|
      tx.begin_transaction
      myobject.try(&.cve_items).try(&.each do |item|
        # Insert General Information into the Database
        cve_item_id = item.try(&.cve).try(&.cve_data_meta).try(&.id) || "Not available"
        data_type = item.try(&.cve).try(&.data_type) || "Not available"
        data_format = item.try(&.cve).try(&.data_format) || "Not available"
        data_version = item.try(&.cve).try(&.data_version) || "Not available"
        published_date = item.try(&.publishedDate) || "Not available"
        last_modified_date = item.try(&.lastModifiedDate) || "Not available"
        tx.connection.exec("INSERT INTO GENERAL_INFORMATION (\"ID\", \"DATA_TYPE\", \"DATA_FORMAT\", \"DATA_VERSION\", \"PUBLISHDATE\", \"LASTMODIFIEDDATE\") VALUES (?, ?, ?, ?, ?, ?)", [cve_item_id, data_type, data_format, data_version, published_date, last_modified_date])
      end)
      tx.commit
    end
  end
end

Admittedly the .try() method calls can be a bit messy and we’re looking into cleaner ways to write this type of code. One recommendation was to port the JSON .dig() method to my struct, in the future I might attempt that. Read about capturing blocks and procs in the Crystal Documentation if you’re having trouble reading the above code. This method is not ideal when working with larger amounts of code. Although other than the readability issues, there was not a noticeable performance impact.

Conclusion

By writing a parser and database insertion logic in Crystal we were forced to use better coding practices. We learned about SQLite transactions, and saved about 14 hours on our database’s build time. The database now builds in under a second. If you have a similar challenge in your organization, consider trying to solve it using Crystal. Consider sharing your story in a comment below.

Crystal Lang: Macros and how they’re useful

Crystal Language Logo
Logo of Crystal Language

The Crystal Programming Language includes a feature called Macros. Crystal’s Documentation states: “Macros are methods that receive AST nodes at compile-time and produce code that is pasted into a program.”. To simplify this means you can write code that writes more code. This post is a deep-dive into how to write macros and why they’re useful.

What’s an Abstract Syntax Tree (AST)?

To understand how Macros work, you should be familiar with the concept of an Abstract Syntax Tree.

In computer science, an abstract syntax tree (AST), or just syntax tree, is a tree representation of the abstract syntactic structure of source code written in a programming language.

Wikipedia https://en.wikipedia.org/wiki/Abstract_syntax_tree

Before a compiler can do its work it reads through all of the source files. While reading it removes all unnecessary characters, spacing, and such. It adds useful annotations such as a line number in case the compiler encounters an error. This is how stack traces work internally. Normally this process would involve changing the files somehow, whereas we’re changing an object representing the syntax instead. An AST Node is just part of the tree. For example one node might be a simple return statement while another might be variable name that equals three. If you want to learn more I recommend reading over the slides from Prof. Stephen A. Edwards Lecture on Abstract Syntax Trees. You could also try listening to Daniel Sanchezs Lecture on Compilers. Both resources will help you get a better grasp than what I can explain.

A first-look at Macros

Macros are an under-documented feature of Crystal when compared to the other features of the language. They are a powerful feature but are also confusing to use. It is a topic as advanced as meta programming. Before learning how to use macros you should have a good understanding of the other functionality of Crystal first. Macros have some special syntax such as {{name}} to insert code or the value of a variable somewhere into the code defined by the macro and a limited subset of language features (some developers say it’s too limited and needs improvement). Macros are challenging but rewarding to use, if you are still reading this blog post give them a try.

How to create a Macro

When trying out a language feature or if I just want to experiment for a while I type crystal play into terminal and load up the in-browser Crystal REPL. Since Crystal is a compiled language, it waits for the user to stop typing, compiles the code quickly, and returns the results of said code in the browser. It’s not a pure REPL but it serves it’s purpose for me. The Crystal Documentation provides an example macro, consider the following code block:

macro define_method(name, content)
  def {{name.id}}
    {{content}}
  end
end
# This generates:
#
#     def foo
#       1
#     end
define_method foo, 1
foo #=> 1

You use the keyword macro to start a block choose a name and define any parameters, then inside the block use the parameters to define methods. This happens at compile time and not runtime, it takes the data from your source files, expands it using the macro and to oversimplify things it copies and pastes the resulting code into your application. You cannot call define_method at runtime. Remember this when using macros in your programs written using Crystal.

A brief Macro syntax summery

I’ve summarized a few Macro syntax features below to give you ideas on what you can do and how you can do it. Several of these examples are from the official docs on macros (which you should read)

Interpolation

You can use {{…}} to interpolate, an AST node.

Conditionals

macro define_method(name, content)
  def {{name}}
    {% if content == 1 %}
      "one"
    {% elsif content == 2 %}
      "two"
    {% else %}
      {{content}}
    {% end %}
  end
end
define_method foo, 1
define_method bar, 2
define_method baz, 3
foo #=> one
bar #=> two
baz #=> 3

Iterators

macro define_dummy_methods(names)
  {% for name, index in names %}
    def {{name.id}}
      {{index}}
    end
  {% end %}
end
define_dummy_methods [foo, bar, baz]
foo #=> 0
bar #=> 1
baz #=> 2

Use-case: Amber Framework uses Macros to register before and after filters

The Amber Framework supports before_filters and after_filters this allows you to run (or yield) a block of code before and after a request. It accomplishes this using a Domain Specific Language which relies on Macros internally. Take a look at the following code block,

# amber/src/amber/dsl/callbacks.cr 
module Amber::DSL
  module Callbacks
    macro before_action
      def before_filters : Nil
        filters.register :before do
          {{yield}}
        end
      end
    end
    macro after_action
      def after_filters : Nil
        filters.register :after do
          {{yield}}
        end
      end
    end
  end
end

The module is loaded and runs two macros, a before_action macro which defines a method called before_filters with no return value and goes through the filters and yields all of the code blocks, the same occurs with the after_action macro and after_filters method. The copying and generation of code is done at compile-time maximizing performance at run-time. A limitation of macros is that the macro cannot rely on run-time information. The macro generated code can use run-time information.

Conclusion

I hope this post gave you a better understanding of macros. Go give macros a try and see what you can build.

Crystal Lang: Building my first web app with Amber

Crystal Language Logo
Logo of the Crystal Language

Recently I heard about a new programming language called Crystal. Crystal is a self-hosted statically typed compiled programming language with C-like performance with Ruby-like syntax. I quickly fell in love with Crystal and the Amber Framework. This post will talk about my experience using it to develop a web application. This post provides a critical overview of what Crystal and Amber, can and can’t do for you. I don’t want to say Crystal and Amber is perfect. Rather I want to be transparent about the current flaws. I want you to make an informed decision to choose Crystal and Amber.

What is Crystal

Crystal is what Ruby would like if it were statically typed and compiled. Crystal marketed as having C-like performance. In reality using benchmarks its performance is closer to that of Rust and Golang.

One of the appeals of Crystal to me is the compiler detects things like nil-unsafe code. The compiler will throw an error if code isn’t nil-safe.

While the performance benefits of Crystal are impressive it does have risks involved.

  • Crystal only has one full time core developer funded, the rest of the core team works in their free time.
  • Crystal’s funding is from its community and a small group of companies using it.
  • There are few Crystal Shards. Shards are the equivalent of RubyGems & NPM Packages. This can make development of new programs take longer. You’ll have to write more functionality on your own.

Shards are unique in comparison to RubyGems and NPM Packages. They behave similar to C++ libraries. You have to compile it on your own. This has security benefits and it saves the developers of Crystal from having to fund their own package servers. They can let GitHub (GitLab, and similar services included here) handle the load for them.

Does Crystal have a Web Framework?

While Crystal looked appealing to me, its website didn’t mention any web frameworks. A quick Google search found Amber. The Amber Framework uses Crystal’s built-in HTTP Server. Amber adds its own features and CLI tools on top to make things easier. You get all of the performance and type safety benefits of Crystal while using Amber. Amber feels like Rails. Amber is less mature but it fits my needs. It follows the model-view-controller paradigm and includes an ORM called Granite.

Amber has some additional goodies included. For example, type amber database into the command line and you’ll be dropped into a Postgresql shell without having to type out the database name. It’s a huge time saver and many similar things are included.

Using Amber does have its own set of cons to add on top of Crystal’s, these include:

  • As with Crystal, the Amber Framework Developers have little funding (no funding according to their LibraPay Page) and are a small core team (all working on Amber in their free time). Bugs may take longer to patch and there’s less people out there to support you.
  • There is less documentation available in comparison to projects like Ruby on Rails. The docs available online explain most things well.
  • Since there are less maintainers, it takes longer to get a pull request merged into their project. If you want to fix a bug quickly you’re out of luck. You will have to apply your own diffs to Amber after pulling the shard from GitHub.

Onto my first project using Crystal and the Amber Framework

Now that I provided background, I can describe my experience developing an application with it. Previously I had been developing a Rails Application that organizes ROM Hacks of Super Mario 64. I decided as a test of the language and framework I would rewrite the app using it. I was able to port the code over in a matter of hours. Ruby has similar syntax to the language. The rewrite process was copying over controllers and views. I had to modify each controller in view to work in Amber, one by one.

How difficult was the change from ActiveRecord to Granite

Writing the database migration code to copy existing data from Rails to Amber was challenging. I had never written migration code before. After the rewrite was complete page loads were three times faster. And it was a positive and pleasant experience for everyone involved. My users didn’t notice the change other than the new theme and faster page load times.

Was was the end result

You can see the end result of the migration at https://hacks.sm64hacks.com/ and check out the source code at https://gitlab.com/sm64hacks/hackdb. I hope this gives you a nice overview of using Crystal and Amber.

Update

Brian J. Cardiff of the Crystal Language Core Team responded to a post on their community forum. Brian said the following about the amount of paid core contributors and how their donations are used.

“The donations advertised in the homepage goes to Manas. Manas currently doubles that income as stated in https://crystal-lang.org/2017/12/19/this-is-not-a-new-years-resolution.html 1 .
Those allows Manas to allocate time from developers to invest in crystal. Although I might be de most visible Manas developer during the last months, other developers joins in scoped or sporadic efforts as well.
Those activities are not always visible. Sometimes is brainstorming, discussing ideas, preparing material, coding, experimenting, giving feedback, etc. Some of them are: waj, ggiraldez, mverzilli, and matiasgarciaisaia.”

Brian J. Cardiff on Crystal Community Forum