How We Cut Latency Down by 30k% on Our Git Server

At Clever Cloud, git is core in the deployment process. With the years, our needs evolved, especially in terms of performance. This is how we dealt with it.

A bit of background first

When we started Clever Cloud, we chose to use gitolite to manage our git repositories. It appeared to be a complete and functional solution. We used it internally first, for managing our own internal repositories even when the product wasn't released yet.

After several months of testing, we were convinced that it was a really good solution, and chose to go with it.

Managing gitolite configuration

gitolite is primary designed to be configured manually by a sysadmin. You have to describe every user, group of users and repositories in configuration files. Gitolite will then use these files to generate its internal stuff for dealing with access rights. Gitolite ensures repositories are created and hooks propagated.

We were in the need of automatic reconfiguration, for quite obvious reasons. We therefore developed a tool called etilotig which aim was to keep gitolite's configuration up to date. On startup, this tool would load its initial configuration from the API and then listen to AMQP events to update its configuration cache and write the new gitolite configuration accordingly.

It worked actually quite well for longer than we expected, and we used it in production until May 5th 2015.

The drawbacks of gitolite

gitolite was great but had a few major drawbacks we weren't happy with.

as previously said, it wasn't meant to be dynamically configurable, which required some non-trivial hacks
it required duplicating the configuration in both the API and gitolite itself
all the repositories were created in a same directory
at each repository creation, it did a full pass on all the repositories to check whether the hooks were up to date
or not
rewriting only part of its configuration wasn't trivial at all so we ended up rewriting most of it for each change

Those problems were annoying but not critical at the beginning, but some of them became quite interesting to us.

The fact that all repositories are created in a same directory is a huge performance issue when the number of repositories become really high.

We actually dropped the part that checks for the hooks at each repository creation from the gitolite code a while back as it took half the time of the gitolite internal configuration regeneration on update.

The new etilotig

While gitolite used to be efficient, lately it had a lot of performance problems, and depending on the timing of the events, it could take up to five (!) minutes to create a new repository. This went far beyond our acceptable limits, so we had to find another solution.

The idea was simple: drop gitolite totally and improve our configuration management tool etilotig to do what was needed by itself.

The checklist we needed to accomplish was quite tiny:

ssh keys management
authorization management
repositories creation
hooks installation

Ssh keys management

When its goal was only to manage the gitolite configuration, etilotig only forwarded them to gitolite, which in turn handled them. Now, we have to manage the authorized_keys file by ourselves.

We basically write a new one each time an ssh key is added or removed and then replace the old one with the new one.

Each line is printed as such:

command="AUTH_SCRIPT USER_ID",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty PUBLIC_KEY

With AUTH_SCRIPT pointing to the authorization script that I'll speak of later, USER_ID being the user id of the key owner and PUBLIC_KEY being their public key.

The authorization script will thus be called with the user id as first parameter.

That solves the first point of our TODO list.

Repositories management

At startup, etilotig writes some bash configuration files defining basic things such as the directory in which repositories have to be created.

Then we have a tiny bash script in charge of the repository creation. We now create them in a deeper directory hierarchy to get as few repositories as possible per directory. Say we had /data/app_18c6021b-0860-4f97-a08d-0663f45cf3f0.git before, we now have /data/app_18/c6/02/app_18c6021b-0860-4f97-a08d-0663f45cf3f0.git which makes things waaaaay faster.

#!/bin/bash

create_repo() {
local repo_dir=""

mkdir -p "${repo_dir}"
pushd "${repo_dir}" &>/dev/null
git init --bare
popd &>/dev/null
}

main() {
    local repo=""
    local repo_dir
    
repo_dir="${REPOS_DIR}/${repo:0:6}/${repo:6:2}/${repo:8:2}/${repo}"

[[ -d "${repo_dir}"/hooks ]] || create_repo "${repo_dir}"
}

. "${HOME}"/.etilotig/.etilotigrc

main "${@}"

Then we have another one for hooks installation.

#!/bin/bash

shopt -s nullglob

main() {
    local repo=""
    local repo_dir
    
repo_dir="${REPOS_DIR}/${repo:0:6}/${repo:6:2}/${repo:8:2}/${repo}"

for hook in ${HOME}/.etilotig/hooks/*; do
ln -sf "${hook}" "${repo_dir}"/hooks/
done
}

. "${HOME}"/.etilotig/.etilotigrc

main "${@}"

With those two simple scripts, we only have to call them for each repository at startup to ensure everything is OK, and at each repository creation, which solves two of the four points from our TODO list, only one left.

Authorization

Now, the goal was not to duplicate the configuration anymore, but rather use the configuration from the API, thus externalising the whole thing. Dropping all the configuration management from etilotig reduced its size by more than 50%.

The way etilotig manages authorization is quite simple: when it generates its internal configuration, it actually generates a perl script which is called on each ssh connection attempt. The script then authorises or not the transaction.

We made a similar script which prints the users some information about which repositories they have access to if they just run ssh git@push.par.clever-cloud.com or such, checks if they are authorized when they try to git push/pull or rejects any other request.

It looks like this (with some extra stuff added):

#!/bin/bash

sanity_check() {
    if [[ -z "${SSH_CONNECTION}" ]]; then
        echo "Who the hell are you?" >&2
        exit 1
    fi

if [[ -z "${SSH_ORIGINAL_COMMAND}" ]]; then
        export SSH_ORIGINAL_COMMAND="info"
fi
}

ask_for_info() {
    local userid=""

# make the request to the API to retrieve user info message
echo "some info"
}

ask_for_authorization() {
    local userid=""
    local appid=""

# make the request and return the HTTP status code here. 200 means authorized.
echo 200
}

authorize() {
    local userid=""
    local verb=""
    local appid=""
    local ret=1
    case "${verb}" in
        "git-receive-pack"|"git-upload-pack")
            local code
            code=$(ask_for_authorization "${userid}" "${appid}")
            [[ "${code}" == "200" ]] && ret=0
            ;;
    esac
    return "${ret}"
}

final_abort() {
    echo "What are you trying to achieve here?" >&2
    exit 2
}
main() {
    sanity_check
    local userid=""
    local verb
    local repo
    local repo_dir
    verb=$(echo "${SSH_ORIGINAL_COMMAND}" | cut -d ' ' -f 1)
    repo=$(echo "${SSH_ORIGINAL_COMMAND}" | cut -d ' ' -f 2 | tr -d "'\"")
    [[ "${repo}" == /* ]] && repo=${repo:1}
    repo_dir="${REPOS_DIR}/${repo:0:6}/${repo:6:2}/${repo:8:2}/${repo}"
    if [[ "${verb}" == "info" ]]; then
        ask_for_info "${userid}"
    elif authorize "${userid}" "${verb}" "${repo}"; then
        export CC_USER="${userid}"
        export CC_NOTIFY_SCRIPT="${HOME}/.etilotig/send-push-event"
        exec "${verb}" "${repo_dir}"
    else
        final_abort
    fi
}
. "${HOME}"/.etilotig/.etilotigrc
main "${@}"

With this in place, nearly everything was ready. Two tiny hooks on top of that to only allow users to push on the master branch, and to trigger a deployment on git push:

hooks/update

#!/bin/bash

main() {
    local rev=""
    if [[ "${rev}" == refs/tags/* ]]; then
        exit 0
    fi
    if [[ "${rev}" != "refs/heads/master" ]]; then
        echo "You tried to push to a custom branch."
        echo "Only master is allowed."
        exit 1
    fi
}

main "${@}"

hooks/post-update

#!/bin/bash

sanity_check() {
    local rev=""
    if [[ "${rev}" == refs/tags/* ]]; then
        exit 0
    fi
}

main () {
    local rev=""
    sanity_check "${rev}"
    local repo=$(basename $(pwd))
    local appId=${repo/.git/}
    local commitId=$(git rev-parse "${rev}")
    "${CC_NOTIFY_SCRIPT}" "${appId}" "${commitId}" "${CC_USER}"
    echo "[SUCCESS] The application has successfully been queued for redeploy."
}

main "${@}"

Conclusion

That's it, we have our new git server manager up and running, which works pretty well.

The performance gain? We went from between 3 and more than 5 minutes to less than 1 second per action, while dropping the whole gitolite codebase and reducing the size of etilotig by 50%, with an average performance gain of 30k%.

gitolite has been very useful both in its utilisation and its codebase to better comprehend the whole authentication mechanism, so a great thanks to this awesome tool.

Now gitolite is dead, long live etilotig!

How We Cut Latency Down by 30k% on Our Git Server

A bit of background first

Managing gitolite configuration

The drawbacks of gitolite

The new etilotig

Ssh keys management

Repositories management

Authorization

Conclusion

À lire également

Deploy Brinjel on Clever Cloud

Simplify the management of Clever Cloud services via Kubernetes with our new operator

Keycloak as a Service: identity management without the complexity