Git with Gitlab

To provide a self-hosted version of github, we run gitlab. This is to both provide a VCS service to all our members and also a centralized place to store code and projects written for the ACM.

To get gitlab configured was unfortunately a non-trivial amount of work. In the event that it needs to be installed again, or upgraded, we may need to redo these steps. Fortunately, the current configuration is mostly backed up. Also fortunately, upstream improvements in gitlab have made things significantly less painful.

Ideally all of this would be packaged or scripted via ansible, but until then...

Installing gitlab

This is at least pretty simple. Download the omnibus-gitlab package from https://about.gitlab.com/downloads/, and follow the instructions for Debian 8 (reproduced here)

sudo apt-get install curl openssh-server ca-certificates postfix
curl -sS https://packages.gitlab.com/install/repositories/gitlab/gitlab-ce/script.deb.sh | sudo bash
sudo apt-get install gitlab-ce

Congratulations, gitlab is installed! We are not, of course, done yet, because it has to be configured.

Configuring gitlab

The gitlab documentation is pretty good. For the omnibus-gitlab packages, configuration works like this: you write Ruby configuration, run gitlab-ctl reconfigure, which generates the YAML configuration actually used by gitlab. Then you run gitlab-ctl start which starts all the services. (You can also stop or restart them).

Minimally, you want to edit /etc/gitlab/gitlab.rb and make it look something like the following

external_url 'http://git.acm.jhu.edu'

# Gitlab settings.
gitlab_rails['gitlab_default_projects_limit'] = 64000
gitlab_rails['gitlab_email_from'] = 'gitlab@git.acm.jhu.edu'

The following sections describe various other configuration topics.

Basic Configuration

Kerberos Configuration

Add the following omniauth configuration to /etc/gitlab/gitlab.rb

# Omniauth configuration
gitlab_rails['omniauth_enabled'] = true
gitlab_rails['omniauth_allow_single_sign_on'] = true
gitlab_rails['omniauth_block_auto_created_users'] = false
gitlab_rails['omniauth_providers'] = [
  {
    "name" => "kerberos",
  }
]

At this point, you should be done!

Note that this will not work optimally. We currently use it because it was the simplest auth system to get running, but users created via Kerberos login will not have passwords associated with their account and will be unable to clone over HTTP. They will also be unable to clone by getting Kerberos tickets.

Gitlab EE has proper Kerberos support (that makes it a first class citizen alongside, say, LDAP auth), but we don’t have the budget for that. Fortunately, however, the omniauth-kerberos gem is included inside the gitlab CE distribution. Were it not to be (which used to be the case), you’d want to follow the following instructions.

Kerberos (Legacy)

To add the omniauth-kerberos gem to an omnibus gitlab install was surprisingly challenging. The instructions I followed originally are reproduced here, but you should not need them anymore unless upstream gitlab significantly breaks something.

I more-or-less followed gitlab’s instructions instructions for Kerberos auth. However, the instructions there are no longer accurate. You want to: (more detailed instructions coming later):

  1. Put the binaries in /opt/gitlab on the system path..
  2. Add the omniauth-kerberos gem to a Gemfile in /opt/gitlab.
  3. Run bundle install --without development test mysql --no-deployment.
  1. You sadly need to disable the CSRF Omniauth check, otherwise everything blows up. See this discussion. Note that this is no longer necessary as of gitlab 7.10+.

SSL

Add this setting to /etc/gitlab/gitlab.rb

nginx['redirect_http_to_https'] = true

Also, change the external_url setting to https.

external_url 'http://git.acm.jhu.edu'

Then put the private key in /etc/gitlab/ssl/git.acm.jhu.edu.key and the x509 CRT in /etc/gitlab/ssl/git.acm.jhu.edu.crt. (Change “git.acm.jhu.edu” to whatever the URL of the gitlab instance is.)

Then gitlab-ctl reconfigure && gitlab-ctl restart and SSL should be working.

Note that in order to add the CRT bundle required to verify our SSL keys, you should use cat to combine the bundle and the actual certificate. This is in contrast to Apache, which still supports a (legacy?) way to specify the bundle separately.

External Postgresql

This is pretty simple to set up. We’re running a postgres database on einstein (the database server); having created a gitlab account there and a gitlabhq_production database (this is what gitlab expects its database to be called). The gitlab configuration looks like this:

gitlab_rails['db_encoding'] = 'utf8'
gitlab_rails['db_host'] = 'einstein.vm.acm.jhu.edu'
gitlab_rails['db_port'] = '5432'
gitlab_rails['db_username'] = 'gitlab'
gitlab_rails['db_password'] = # Insert password here.

Then just create the right auth configuration in pg_hba.conf on einstein and everything should be good.

LDAP Configuration (Legacy)

This is not actually how we’re configuring gitlab anymore, and the syntax itself was deprecated, but here is a block of /etc/gitlab/gitlab.rb for legacy purposes

# LDAP configuration.
gitlab_rails['ldap_enabled'] = true
gitlab_rails['ldap_host'] = 'ldap.acm.jhu.edu'
gitlab_rails['ldap_port'] = 389
gitlab_rails['ldap_uid'] = 'uid'
gitlab_rails['ldap_method'] = 'plain' # 'ssl' or 'plain'
gitlab_rails['ldap_bind_dn'] = 'uid=query user,ou=People,dc=acm,dc=jhu,dc=edu'
gitlab_rails['ldap_password'] = ''
gitlab_rails['ldap_allow_username_or_email_login'] = true
gitlab_rails['ldap_base'] = 'ou=People,dc=acm,dc=jhu,dc=edu'

Run gitlab-ctl reconfigure And if gitlab’s LDAP implementation was sane, we’d be done.

Unfortunately, gitlab’s LDAP configuration is not sane. It thinks email is the primary attribute for users, not emails, which means that users using LDAP cannot change their emails... because that would allow them to impersonate any other user. Also, we would need to create an bind DN for gitlab to connect to.

Instead, we’ve opted to use Kerberos authentication for now using a custom omniauth provider, despite potential drawbacks. This is a temporary fix and should hopefully be dealt with later. I am not, however, sure how we would be able to merge the kerberos accounts with LDAP accounts.

AFS Repositories

In order to get gitlab to store its repositories in AFS, you need to first create a principal (we use host/git.vm.acm.jhu.edu@ACM.JHU.EDU and rcmd.git for Kerberos and AFS respectively) and land its keytab on the gitlab server. Having done so, we then need to make sure that gitlab runs with AFS tokens for that principal and ensure that whatever storage in AFS (we use /afs/acm.jhu.edu/service/gitlab/git-data) is able to be read and write by that principal.

gitlab-kerberize

Since omnibus-gitlab / gitlab-ctl uses runit under the hood to manage services, it was really easy to create another service and land it in /opt/gitlab/sv. Much like the webserver’s configuration for serving up CGI scripts, the run file for this service looks like the following:

#!/bin/sh
exec /opt/gitlab/sv/kerberize/kerberize.sh git

And it invokes the kerberize script (copied from the webserver) that does this:

#!/bin/sh
USR=$1
export AKLOG="su ${USR} -s /bin/sh -c /usr/bin/aklog"
exec k5start -U -t -K 600 -l 12h -k KEYRING:user -f /etc/krb5.keytab

(Note that I named the server “kerberize” but this should probably be stylized as “gitlab-kerberize” if we want to, say, release this).

Then simply run gitlab-ctl restart and it should start this service (though for whatever reason, this won’t be outputted). To confirm, become the git user and run tokens.

gitlab-ctl reconfigure issues

Note

This is no longer an issue; a version of this feature was implemented as of a recent gitlab version. The information below has been preserved for historical reasons; now, the option manage_storage_directories['enable'] = false simply needs to be added to /etc/gitlab/gitlab.rb.

In order to run gitlab-ctl reconfigure, we sadly need to patch it. I have sent this patch upstream but it’s very site specific so I’m not sure if they would take it. Regardless, the patch is detailed here.

The reconfigure script invokes a chef recipe that tries to create the configured repository directory, set ownership to git:git, and set specific Unix permissions. Attempting to make this chown in AFS, however, causes a “insufficient permissions” failure even when you run reconfigure with the right AFS tokens. The failure is in “chown_internal” which suggests a very low-level issue. Fortunately, since we had to manually set up the AFS permissions anyway, we can simply apply the following patch. This is applied to the file “/opt/gitlab/embedded/cookbooks/gitlab/recipes/gitlab-shell.rb”; specifically, we want to modify line 31 as follows:

directory repositories_path do
  if not repositories_path.start_with? "/afs"
    owner git_user
    group git_group
    mode "2770"
    recursive true
  end
end

This will only attempt to set ownership if the repository path is not in AFS.

With any luck this will land in upstream, but if not we may need to occasionally re-apply it ourselves.

Changing the repository_path directory

Now, we can simply write the following in our gitlab configuration:

gitlab_rails['gitlab_shell_repos_path'] = "/afs/acm.jhu.edu/service/gitlab/git-data/repositories/"

And then run reconfigure and restart and everything should just work!

Note AFS-repositories may benefit from good AFS caching, or (if running on top of the ACM’s ceph instance in a virtual machine) a RAM only cache.

Note

As of a recent gitlab release, this has been changed. While it’s not immediately clear why, the following line is now needed in gitlab configuration. git_data_dirs({"default" => "/afs/acm.jhu.edu/service/gitlab/git-data"}). Note that we’re not specifying the full repository path, just the path to git-data.

CI Runners

We use Openstack to install gitlab CI runners. We aren’t currently offering a runner image, but a runner is currently a standard Debian Jessie VM with the jhuacm-dev and jhuacm-utils-admin packages is installed.

Note that ci runners need to think git.acm.jhu.edu = git.vm.acm.jhu.edu due to Openstack firewall issues. Currently this configuration needs to be manually set in /etc/hosts on each boot, because cloud-init scribbles over the hosts file (I think). This is not ideal.

Using AFS with Gitlab CI

If you want your CI builds to be able to run as a Kerberos principal, you can do that! We hope to make this ultimately a better experience, but here is the workflow that was used to set up the debian-metapackages project.

Assuming you have already set up a Kerberos and AFS principal and given it access to the directories you need to be able to access, and assuming you’ve created a keytab, we can use Gitlab CI’s “hidden variables” function to store a base64-encoded copy of the keytab (which you can produce by running base64 /path/to/keytab. At the moment you must manually copy the encoded keytab into the “Variables” tab under the Build Settings of your project.

Then, the following gitlab CI yaml script can be used to do things:

script:
  - echo "$KRB_KEYTAB" > ci.keytab.enc
  - dos2unix ci.keytab.enc
  - base64 -d ci.keytab.enc > ci.keytab
  - kinit -kt ci.keytab principal@ACM.JHU.EDU
  - aklog acm.jhu.edu
  - # Run your actual code here! For instance, deploy to a public_html directory.
  - rm ci.keytab*
  - kdestroy
  - unlog

Repairing Corrupt Pack Indexes

(Transfered from a rather old document...)

If a disk has eaten your homework and you’re seeing things like error: wrong index v1 file size in ${INDEXFILE}, the use of git index-pack ${PACKFILE} will probably be beneficial, as it will rebuild the index given the corresponding packfile. One can only hope that the disk did not simultaneously eat the pack file.