Monday, May 24, 2021

Out of touch with C++, and other new stuff

MongoDB is quite cool. Apart from being written in C++, it's a document DB (one that can store and index JSON / JSON-like content) in a distributed database that supports sharding, replication, and eventual consistency. You can run it as a single server too, which is useful for writing application code that connects to it.

  1. The server is called mongod.
  2. There is a REPL client / interpreter called mongo which also happens to be a full-fledged JavaScript interpreter.
  3. A single Mongo instance can host multiple databases, each stored in its own file.
  4. Each database can have multiple collections. analogous to tables in an RDBMS.
  5. Each collection has zero or more documents, corresponding to rows in RDBMS but the similarities are faint.
    1. A document is a JSON or JSON-like content. JSON-like refers to the fact that there are extensions to the JSON-format supported - with value types like Date, binary, etc. being supported.
    2. Every document has a unique identifier - the _id attribute, which must be unique in a collection. It's default type is ObjectId, which is a 12-byte integer, but can be any type. The 12-byte integer is partitioned from left to right as "4[epoch seconds]|3[hostname hash]|2[pid]|3[incr num]".

The main meat of the topics are perhaps in dealing with distributed data - sharding, indexing, consistency, and all the tools used to implement these. There are also specifics like expression languages for queries, as well as client libraries. There are also many specifics about working using the mongo REPL. Last, but not nearly the least, is the topic of building applications using Mongo - a topic that influences how application data is expressed via Mongo, and consumed from it. More on that in another post.

C++ has been an old love affair, a fact that alienates me from a lot of well-meaning programmers at the outset, and endears me to a few. But truth be told I have not written a lot of C++ over the past few years and have grown my own contrarian opinions about the usefulness of many recent additions to the language. I didn't get a chance to work with fold expressions earlier but looked at it recently.

To me fold expressions appear to be a syntactic convenience for unrolling loops involving parameter packs without explicitly writing the templates and their specialization needed before. The code does become shorter, but does it really become more expressive? I don't know - I think it becomes a little cryptic / terse because the syntax doesn't intuitively express what's happening. You have to know and get used to it, like with parameter pack expansions involving function and template expressions.

I don't have much to argue on the matter. The C++ folks can always shut me up by saying that this is a tool for library writers that people like me can ignore. Maybe they are right. But I do wonder, why library writers have to put up with cryptic syntax. Isn't simplicity of value to them?


Read more!

Friday, December 18, 2020

Recent pitfalls at work

Spent unproductive hours debugging issues that should have taken less time. Why?
  1. I implemented something without experimenting enough, based on dodgy documentation (in fact no good documentation is available).
  2. An oversight in my analysis / design. I got away with a small fix but sometimes this can be costly.
  3. Didn't check if my code ran on all environments it was targeted at. For example, I introduced a dependency on OpenSSL without verifying if it can be fulfilled everywhere.

 #1 is easy to address. Run more small experiments as a habit while building software. I do blame the dodgy doc too. I eventually ran the experiments and it took some effort to be honest.

#3 should be done better as a habit. Listing assumptions and environment requirements explicitly while making them should be a good start.

#2 was disappointing because I pride myself on design thinking. It was a small detail that I missed through oversight. The way to catch this early would be to deliberately list all preconditions.

Read more!

Monday, September 14, 2020

Enable autocomplete for C++ on vim

 Assuming you have a recent version of vim (8.1 should do) this is a lot easier now.

  1. If you don't already have Vundle:
    git clone https://github.com/VundleVim/Vundle.vim.git ~/.vim/bundle/Vundle.vim
  2. Add the following section to ~/.vimrc. The vundle section may already be there in which case you just add the Plugin line for Valloric/YouCompleteMe.
    set rtp+=~/.vim/bundle/Vundle.vim
    call vundle#begin()
    Plugin 'VundleVim/Vundle.vim'
    Plugin 'Valloric/YouCompleteMe'
    call vundle#end()
  3. Clone YouCompleteMe.
    git clone https://github.com/Valloric/YouCompleteMe.git ~/.vim/bundle/YouCompleteMe
  4. Update YouCompleteMe.
    cd ~/.vim/bundle/YouCompleteMe && git submodule update --init --recursive
  5. Install the clang completer. This works but a better way may be to configure clangd.
    ./install.py --clang-completer
  6. Add a file .ycm_extra_conf.py to some ancestor directory of your C++ projects.
    def Settings( **kwargs ):
      return {
        'flags': [ '-x', 'c++', '-Wall', '-Wextra', '-Werror' ],
      }

    There are other ways of doing the above that can give you more flexibility in your own sandbox. You can check them here: https://github.com/ycm-core/YouCompleteMe#general-semantic-completion


Read more!

Sunday, August 23, 2020

Algorithms: Two interesting problem solving techniques

I recently learned two interesting techniques for solving problems using algorithms that could be called non-obvious and ingenious. But I think there is a pattern to it that is worth recognizing, and hence this post. These problems are courtesy one or the other of the several good online platforms for practicing algorithmic problems. The code is mine.

Using BFS to solve a dynamic programming problem

(This section is identical to my quora answer: https://www.quora.com/Is-the-breadth-first-search-an-example-of-dynamic-programming)

Let's look at the first problem: You have N oranges. On any given day, you can decide to eat a certain number. This can always be 1. But if you have an even number of oranges, you can eat N/2. If the number of oranges you have is divisible by 3, you can instead eat 2N/3 if you want. What is the minimum number of days in which all the oranges can be eaten?

This problem is definitive of typical dynamic programming problems and there is a fairly routine dynamic programming solution to this problem. Here goes the dynamic programming solution first.

  1. int minDays(int n) { 
  2. std::vector<int> minDays(n+1); 
  3. // minDays[n] == min days for n oranges 
  4.  
  5. minDays[0] = 0; 
  6.  
  7. for (int i = 1; i <= n; ++i) { 
  8. int minDay = 1 + minDays[i-1]; 
  9. if (i % 3 == 0) { 
  10. minDay = std::min(minDay, 1+minDays[i/3]); 
  11. } 
  12. if (i % 2 == 0) { 
  13. minDay = std::min(minDay, 1+minDays[i/2]); 
  14. } 
  15. minDays[i] = minDay; 
  16. } 
  17. return minDays[n]; 
  18. } 

This is an O(n) algorithm. But turns out that we can do much better. Consider N=12. On the first day, you could eat 8 oranges, or 6 oranges, or just 1. So there are three different paths you code take. For each of those there would be one, two, or three choices to make on the second day. This is also the classic dynamic programming structure but, it is also a graph. By traversing through this graph breadth-first, it is possible to evaluate these paths and figure out the first path to reach zero - the fastest to zero oranges. The following is a BFS solution.

  1. int minDays(int n) { 
  2. if (n <= 0) { 
  3. return 0; 
  4. } 
  5.  
  6. std::queue<int> bfsQueue; 
  7. std::set<int> seen; 
  8. bfsQueue.push(n); 
  9. int days = 0; 
  10.  
  11. while (!bfsQueue.empty()) { 
  12. int size = bfsQueue.size(); 
  13.  
  14. for (int i = 0; i < size; ++i) { 
  15. int entry = bfsQueue.front(); 
  16. if (entry == 0) {  
  17. return days; 
  18. }  
  19.  
  20. bfsQueue.pop(); 
  21.  
  22. auto it = seen.insert(entry); 
  23. if (!it.second) { 
  24. continue;  
  25. }  
  26.  
  27. // push the child entries 
  28. if (entry % 3 == 0) { 
  29. bfsQueue.push(entry/3); 
  30. }  
  31. if (entry % 2 == 0) { 
  32. bfsQueue.push(entry/2); 
  33. }  
  34. bfsQueue.push(entry-1); 
  35. }  
  36. days += 1; 
  37. }  
  38. return days;  
  39. }

I don’t have a complexity number for this. Had the child nodes been all distinct, the complexity would possibly have been O((3k)^d) for some constant k less than 1, where d is the minimum number of days required. This itself grows much faster than O(n). But the fact that many of the child nodes are actually the same - the overlapping sub-problems property of dynamic programming - possibly makes this O(d*log(n)) or something like that. And I could be way off the mark here - I am just speaking from observation and some very rudimentary reasoning.

Using binary search in an optimization problem

Here goes the second problem: You are given a list of positions on a straight-line where you can place magnets. The attraction between the magnets is inversely proportional to the distance between them. You are given m magnets and want to minimize the maximum possible attraction between any two magnets when you arrange them. That is same as saying that you want to maximize the minimum distance between two successive magnets.

The given positions are a constraint. Discounting that for a minute, if you had four magnets that you could place anywhere between positions 1 and 10, how would you do it?

 First magnet at 1, last magnet at 10 - that's a given. The second magnet could be at 4, the third at 7. That would make the minimum distance between any two magnets to be 3. You cannot have any arrangement of four magnets at positions 1-10, in which the smallest distance between two magnets (obviously successive) is 4. Now how on earth does one solve this. I thought of dynamic programming initially but couldn't frame it as one - maybe it is possible to solve it that way. But if the range of positions is 1 through 10, and there are 4 elements, the elements an equitable distribution of 4 elements would require a distance of maximum (10-1)/(4-1) = 3 between two successive elements. Therefore the minimum distance between two elements can never be more than 3. And of course, the least distance is when they are next to each other - i.e. 1. This means, what we are trying to find out is really whether it is possible to place the elements with a minimum distance of x between them where 1 <= x <= 3. Of course, instead of 1 <= x <= 3, the range could be arbitrarily large, say 1 <= x <= 1000000. And that's where you're gonna have to search that state space using something better than a linear algorithm starting from 1 through 1000000 (or the other way). Now if it be possible to place elements with a minimum distance of x, then it goes without saying that it is possible to do so for all [1, x]. So then we are interested to find if it is also possible to do so for some x' in (x, 1000000]. So by making minor modifications to the binary search process we can find the largest x satisfying our constraint. Here goes the code.

 

    int maxDistance(vector<int>& position, int m) {
        if (position.empty() || m <= 1 || position.size() < m) {
            return 0;
        }
        std::sort(position.begin(), position.end());
        int first = position.front(), last = position.back();
        if (m == 2) {
            return last - first;
        }

        int max_gap = (last - first)/(m-1);
        int min_gap = 1;
        int max_min_gap = -1;
       
        while (min_gap <= max_gap) {
            auto cur_gap = min_gap + (max_gap - min_gap)/2;
            if (!canFitWithMinGap(position, m, cur_gap)) {
                max_gap = cur_gap - 1;
            } else {
                max_min_gap = cur_gap;
                min_gap = cur_gap + 1;
            }
        }
        return max_min_gap;
    }
   
    bool canFitWithMinGap(const vector<int>& position, int num_elems, int gap) {       
        auto begin = position.begin();
        auto start = *begin;
        int last = position.back();
        for (int i = 0; i < num_elems - 2; ++i) {
            begin = std::lower_bound(begin, position.end(), start + gap);
            if (begin == position.end() || (last - *begin) < gap) {
                return false;
            }
            start = *begin;
        }
        return true;
    }

The idea is simple but it takes a bit of thinking to see this as a viable approach.


Read more!

Wednesday, April 03, 2019

An Inexact Introduction to what Envoy is / does

Backstory

Of late I have spent a bit of time on Envoy. They say it's the next big thing for cloud services. It changes how microservices are written and deployed. With that kind of interest and developer traction, you'd imagine that they'd have a fantastic set of tutorials in their docs to get any interested engineer started. Well, they do. But like everything else in this day and age, to read those fantastic docs you already have to know a fair bit about proxies and API gateways and stuff that I feel not every Envoy newbie need know. What is a dummy to do? For one, persevere, and for two, make it demystify (aka help cut the crap). So this one is about essentially what I managed to learn about Envoy so far. Precious little, but I'll try summarizing nonetheless.

When you build a software system these days, especially a client-server kind of app, you split it into a few (or many) small components that interact with each other to perform cohesive functions - microservices. If this system has to serve requests as most systems do, then you may need to be able to operate without loss of functionality and responsiveness as the number of requests grows - a quality that's known as scalability. Building cloud services in terms of microservices is the norm today. Microservices confer a great deal of flexibility in how we address availability, responsiveness, and scalability of our cloud services. But leveraging all of it isn't always easy if you're a microservice author. Simply put, there is a lot of cross-cutting concern that every microservice author has to think about - right from transport layer security and authentication, to discovering peer services, to load balancing, rate limiting - concerns that are not central to the business logic of the microservice. Addressing them is hard enough. Once you consider that different microservices in the same cloud app could be written in different languages or frameworks, the problem becomes harder still.

First look at Envoy

In a nutshell, Envoy, developed at Lyft and written in modern C++, allows you to build microservices without bothering too much about how to route requests to other microservices, how to handle SSL connections, authenticate users, do load balancing, rate limiting, circuit breaking, and lots more (patience, I'll explain all the terms). In other words, it helps address all of the cross-cutting concerns mentioned earlier in a polyglot microservices environment. And it does so in an extensible way allowing anything from hard-coded static configuration to completely dynamic configuration for everything from the endpoints used to serve requests, the clusters serving them, to the load-balancing policies.

Now true to the promise above, I owe a short explanation of some terms I casually threw at you.

Load balancing: You have lots of requests coming in and you want to serve them all responsively and reliably. What you do is create multiple replicas of your service and then route requests to them spreading the load across the replicas. The exact strategy can vary, and usually depends on whether your services are stateful or stateless.

Circuit Breaking: Service A talks to service B in order to serve requests. If B be heavily loaded and unresponsive, or simply unavailable, A owes it to the user to degrade gracefully instead of being hung. A also owes it to a perhaps already-loaded B to not bombard it with even more requests in such a situation. Detecting such a situation, and preventing requests from A to B for a short time, before once again resuming them is what circuit breaking is about.

Rate limiting: You don't want overeager clients to swamp your service with more requests than you can reliably handle. Rate limiting does this using various strategies and algorithms. You reject requests beyond a threshold number per second, or throttle requests by introducing small random delays while routing them. You put such checks at the client, and at the server.

Envoy in wee bit more detail

Ok, back to Envoy. So what does Envoy deal in? As in, what are the abstractions or domain objects in terms of which Envoy operates? I ask this, because without being able to satisfactorily answer this question about a given software system, I have noticed I never make a good job of trying to make sense of the system itself. So here are the key abstractions.

  1. Some notion of a gateway - IP address + port + protocol - that downstream clients connect to and to send requests to multiple services. Envoy calls them listeners.
  2. Some notion of an endpoint - an IP+port pair that serves a specific service. Envoy calls them, surprisingly, endpoints! The endpoints could use raw IP addresses or FQDNs that are resolved via a DNS.
  3. Some concept of an logical grouping of endpoints running a specific service. Envoy calls these clusters.
  4. A notion of routing requests from the gateway to the clusters. Envoy calls these routes and route_configs.
  5. Some concept of a request URL that a client hits, consisting of a virtual host or domain name, an API path prefix, etc. This is used to determine which routing rules are invoked.
  6. Some concept of pluggable middleware for intercepting requests and processing them. Envoy calls them filter_chains and they consist of one or more filters through which each request passes. You can do all sorts of things in these filters, such as handling specific network protocols, authentication, rate limiting, etc.
  7. Policies around load balancing, rate limiting, circuit breaking, etc.

An Envoy configuration typically consists of one or more listeners, each of which defines one or more filter chain. An incoming request would be accepted by a listener and based on the attributes of the request, one of the filter chains would be used to process the request. Each filter in the matching filter chain would process the request in order. For http requests, the http_connection_manager (hcm) filter is used, while for handling plain TCP requests, the tcp_proxy filter is used. Routes and route_configs are defined within the http_connection_manager or tcp_proxy configuration, which define how matching requests are forwarded to specific backend services. The backend services are modeled as clusters, each consisting of one or more endpoint addresses of the backend services.

Envoy deployment

So how does Envoy run alongside your own services? Several ways are possible. Most commonly it is deployed to run in both of the following roles:

1. As an API gateway that handles and routes all incoming requests to different microservices. This is called the edge proxy because it sits on the edge of your app boundary. It's a gateway into your app, so to speak.
2. As a peer process of each service, intercepting, qualifying, checking, routing all its incoming and outgoing data. This is called a service proxy. Imagine that you the microservice writer don't need to bother about TLS-secured connections, authentication, discovering service endpoints to talk to, etc. You just identify which other services to talk to and send requests and response to your peer Envoy on some port on the local machine (technically, in the same network namespace). It takes care of routing those requests.

Now in the overwhelming majority of cases, Envoy would run as a Docker container. As a service proxy, it would likely run as a sidecar container alongside the service container. But it can also run as standalone binaries.

Summary

Thus, Envoy serves as an edge and service proxy. It handles routing of incoming requests and service-to-service requests, and takes care of lots of common concerns. It allows you to write really simple microservices which practically need to do nothing more then getting its own business logic right. Now the above is a deliberately dumbed-down version of the truth, because Envoy does a lot more. It can work at both TCP/UDP + SSL level (L3/L4 proxy), as well as at HTTP level (L7). It can handle GRPC, and HTTP/2. And there is much more to it. But at its core, it is a proxy for a microservice-based apps that makes routing between services declarative and easy, and adds a whole host of useful services.


Read more!

Saturday, May 19, 2018

Setting up a private insecure Docker registry

Setting up a private insecure Docker registry for you Kubernetes sandbox

Well there's nothing specific to Kubernetes about this article. It just shows you how to quickly setup an insecure docker registry locally on one of your VMs. But if you do have a local Kubernetes setup on a set of VMs as described in my last article, then setting up a local docker registry, and pushing to it all the images you intend to deploy to your k8s cluster, would save you precious bandwidth (and time).

Running the docker registry

Pick a node to run your docker registry. I usually pick the master node of my kubernetes cluster arbitrarily, but it can really be any accessible node. The only thing that you need to make sure is that it has a fully-qualified domain name or a stable IP (statically assigned or a DHCP IP that's configured in your router to be sticky based on MAC address).
The actual docker registry is best run as a container, using the registry:2 image. Bind mount a host directory to /var/lib/registry to store the images durably.
$ mkdir -p /var/lib/local-registry/registry
$ docker run -d -p5000:5000 --restart=always --name local-registry -v /var/lib/local-registry/registry:/var/lib/registry registry:2

Accessing the registry from your k8s nodes

The private registry you just configured needs to be accessible from you k8s nodes, so that they could all access images from this registry. Because the registry you just configured is an insecure registry and does not use TLS, you must tell the docker daemon of each individual node that needs access to this registry that it should access these registries via http and via https.
Assuming that the host on which you're running the registry has a hostname of reghost (you can also IP address), the way to do that would be the following:
$ [ -w /etc/docker/daemon.json ] && \
  jq '."insecure-registries" = [."insecure-registries"[], "reghost:5000"]' /etc/docker/daemon.json >/tmp/daemon.json && \
  mv /tmp/daemon.json /etc/docker/daemon.json
The above assumes that you have the jq utility, which is a totally cool json utility that you should master. This would not have succeeded if the file didn't exist already. In that case:
$ [ -w /etc/docker -a -r /etc/docker -a ! -f /etc/docker/daemon.json ] && cat > /etc/docker/daemon.json < EOF
{
  "insecure-registries": ["reghost:5000"]
}
EOF
On older versions of Docker, the following seems to work on RedHat / Fedora / CentOS based systems:
$ [ -w /etc/sysconfig/docker ] && cat >> /etc/sysconfig/docker << EOF
INSECURE_REGISTRY='--insecure-registry reghost:5000 <<and-more>>'
EOF
Note that if you want to access your registry without the port number (because docker uses 5000 as the default port), you would need to list it separately in the insecure-registries key, in addition to the one qualified with the port number. Following this, on every such node where you added or edited /etc/docker/daemon.json, you need to restart the docker daemon:
$ systemctl daemon-reload
$ systemctl restart docker

Pushing images

You should now be ready to push images to this registry. Assuming you created a local image call mywebserver, you could try this:
$ docker tag mywebserver reghost:5000/myuser/mywebserver:latest
$ docker push reghost:5000/myuser/mywebserver:latest
The above will push all the layers of your image to your registry. You should be able to verify the addition of new content on your registry under /var/lib/local-registry/registry (or whichever path you bind mounted in your registry container).
Et, voila!

Read more!

Saturday, February 17, 2018

Setting up vim autocomplete for golang

The key is to know which keystrokes to use, and you should use Ctrl-X followed by Ctrl-O in insert mode to bring up completion suggestions in a pop up. If you want automatic pop ups as you type, YouCompleteMe is supposed to work. I couldn't get it to work, and it wouldn't work on some of my setups because it requires a more recent version of vim (7.4.15xx) than I have.

So what was needed? Assuming your GOROOT and GOPATH are correctly set up (if not, see below), the following is all that you need to do:
mkdir -p ~/.vim/autoload
curl -LSso ~/.vim/autoload/pathogen.vim https://tpo.pe/pathogen.vim
echo "call pathogen#infect()" | cat - /etc/vimrc > ~/.vimrc.new
mv ~/.vimrc ~/.vimrc.old && mv ~/.vimrc.new ~/.vimrc
go get github.com/nsf/gocode
mkdir -p ~/.vim/bundle
git clone https://github.com/fatih/vim-go.git ~/.vim/bundle/vim-go
cd $GOPATH/src/github.com/nsf/gocode/vim
./update.sh
Then, open a vim session and type the following vim command:
:GoInstallBinaries
The above will install additional go command-line tools and you should be all set. Just one more thing. Open you /etc/vim/vimrc file (if you have root privileges) or your ~/.vimrc file (which you could copy from /etc/vim/vimrc) and add or uncomment the following lines:
filetype plugin on
if has("autocmd")
  filetype plugin indent on
endif

You can see what it should look like when you edit go code and press Ctrl-X followed by Ctrl-O in insert mode.


The screenshot gif was created on Ubuntu 16.04 using peek, along with gifski.

Setting up your Golang development environment

Installing the latest golang development environment (currently go 1.9) and setting it up involves the following steps:
curl -L0 https://storage.googleapis.com/golang/go1.9.linux-amd64.tar.gz > go1.9.linux-amd64.tar.gz
tar xfz go1.9.linux-amd64.tar.gz -C /usr/local

GOROOT=/usr/local/go
GOPATH=~/devel/apporbit/go
PATH="$GOROOT/bin:$PATH:$GOPATH/bin"
export PATH GOROOT GOPATH
The key is to have GOPATH set correctly. Also set up glide for better vendoring / package dependencies.
go get github.com/Masterminds/glide
You create / checkout all your projects under $GOPATH/src. If you clone a github project, it should be under $GOPATH/src/github.com/<user>/<repo>. Using go get to get go packages would also clone the repos under these paths.

Read more!

Wednesday, November 29, 2017

Java alphabet soup

Java alphabet soup

Trying a Java refresher, more specifically a Spring refresher, has so far been a source of mixed emotions. Having written a lot of Core Java in the past and used a smattering of Spring, the utility of either wasn't in question in my head. But then having developed tons of Ruby on Rails apps, and seen both its magic and seamy sides, my perspective perhaps has more dimensions to it today.

I found the Java / Spring way of developing web applications a little too archaic by today's standards. All the annotations have still not exorcised much of the esoteric XML that you still cannot avoid. But Eclipse provides significant relief to the extent that you can get away without writing perhaps a single line of XML, using its Maven and Spring (Spring Tools Suite / Web Tools Platform) plugins instead to add most of the XML content.

You still need to manually configure Tomcat, manually configure data sources and JNDI for accessing your data sources in server specific ways. You still have to manually configure your web.xml. Those are hard, and you have to know at least what to search for in google, and then what to search for in the documentation that google lists. It is frustrating if you don't have a very good, precise tutorial or howto. It all shows the struggles of an old and evolving framework in trying to remain usable even as it remains solidly relevant. I suspect Spring Boot, something I am yet to explore, would bear signs of some true-north evolution. In the meantime, one hopes Java comes up with more modern web frameworks. Spring is mature, stable, mildly usable and very useful. But it feels a decade behind. Spring addressed usability problems of Java EE. But today, something else needs to do that for Spring. Spring Boot may be an evolutionary step in that direction, but more needs to be done.

Read more!

Saturday, November 11, 2017

Setting up local repos for Ubuntu packages





Setting up repos of .deb packages for Ubuntu may not be something people need to do often on their own laptops. But I was recently doing something where it made sense. I was setting up a kubernetes cluster using VirtualBox on my Linux laptop and wanted to automate the whole process using vagrant and ansible. This meant that each time a VM would be spun up, I would add apt repositories to it and then install docker.io, kubeadm, kubectl, kubelet, and kubernetes-cni packages. All of these VMs were to be on my laptop, and each time they'd reach out to google's or docker's repos to pull these packages in. A sum total of around 70 MB isn't big but I could be spinning up tens of VMs over the course of my experiments and a fresh download off the web every time is a terribly inefficient use of bandwidth (certainly here in India). So I wanted to setup an apt repository locally on the host laptop which runs Ubuntu 16.04 (xenial).

The plan

On the host: What we need to do is to run a web server (Apache2 works fine) that serves a directory with the packages that I want to host. To make this secure, we need to generate a key pair using GnuPG, and sign the packages I want to expose. I should also expose the public key. The packages are .deb files (the analogues of .rpm on RedHat / Centos, etc.) and in case you've already installed them on your host laptop, they might be available under the directory /var/cache/apt/archives. If not you can download the packages without installing using a command-line switch for apt install.

 
On the guests: On each guest, we'll need to edit the /etc/apt/sources.list file to include the repository from the host laptop. We'll also need to accept the public key from this repository.

The action

We shall follow the plan outlined above.

Actions on the host

Setting up the webserver and packages

  1. The following will install the Apache 2 webserver and other prerequisites.

    $ sudo apt install apache2 dpkg-dev dpkg-sig
     
  2. The root virtual directory for Apache is by default /var/www/html. You should create the following directory tree under it: /var/www/html/pkgs/dists/$(lsb_release -s -c)/main/binary-amd64. To do that, run the command:

    $ mkdir -p /var/www/html/pkgs/dists/$(lsb_release -s -c)/main/binary-amd64
     
  3. Download the required packages in the directory you just created:
     
    $ cd /var/www/html/pkgs/dists/$(lsb_release -s -c)/main/binary-amd64
    $ apt download 
    

Setting up your key pair and signing packages

  1. If you haven't already, you should generate a key pair using GnuPG.
     
    $ gpg --gen-key
    

    In the prompts that follow, select the key type to be RSA (sign only), key size to be 4096 bits, key does not expire, and specify a unique name. Specify a reasonable password. Wait for the keys to be generated.

  2. Once the key is created, run:
     
    $ gpg --list-keys
    

    Note the line starting with pub like this one:
     
    pub   4096R/B1B197AF 2017-11-11
    

    Note the number appearing after the size specifier (4096R/). Here it is B1B197AF. This would be your key identifier and would be used in the next step.

  3. Generate your public key file so that it is accessible through your web server:
    $ cd /var/www/html/pkgs
    $ sudo gpg --output keyFile --armor --export B1B197AF
     
  4. Finally, sign each of your packages thus:
     
    $ cd /var/www/html/pkgs/dists/$(lsb_release -s -c)/main/binary-amd64
    $ sudo dpkg-sig --sign builder 
    

    Also, generate a Packages.gz and a Release + InRelease files.
     
    $ cd /var/www/html/pkgs/dists/$(lsb_release -s -c)/main/binary-amd64
    $ dpkg-scanpackages . | gzip -9c > Packages.gz
    
    $ apt-ftparchive release . > Release
    $ gpg --clearsign -o InRelease Release
    $ gpg -abs -o Release.gpg Release
    

Actions on the guest

  1. Run the following command to add the local repository:
     
    $ sudo su
    $ cat <> /etc/apt/sources.list
    > 
    > deb [ arch=amd64 ] http://laptop-host-ip/pkgs xenial main
    > EOF
    

    One point to understand here is that you would want your repository to be chosen preferentially over others if the packages are present in your repository. So unless the packages of interest not be present on other listed repositories, you should put this entry above any other repositories. The above command appends your repository at the end of the fail. To move it up in the file, you may need to edit it manually.

  2. Add the laptop host repository's key:
     
    $ wget -O http://laptop-host-ip/pkgs/keyFile | sudo apt-key add -
     
  3. Now run the following:
     
    $ sudo apt update
    

    Once this step passes, you can run:
     
    $ sudo apt install 
    

Read more!

Friday, February 19, 2016

Stuff worth exploring

Tools worth exploring:

Unix

libtool
ldconfig
pkg-config

C++

C++1y

Scale / network I/O

seastar
wangle
folly
Mellanox libvma


Java





Read more!

Tuesday, September 15, 2015

Git in sixty-odd (was thirty-odd) questions

Moving to git from another version control system like SVN, Perforce or CVS often represents teething pains that don't go away as soon as you'd like them to. But git is a fabulous tool and learning it is worth your while. The following FAQ tries to address frequently encountered scenarios that programmers face when moving to git. It assumes that you understand what a version control system is and have some experience using another system like SVN, Perforce or CVS.

In the Unix tradition, this FAQ is terse rather than elaborate. We don't do visualization of branches and branching models, etc. here. There are better tutorials dedicated to them. The expectation is that in this FAQ you will find answers to the most frequent scenarios that you encounter and know which git command to use. And you should be able to cobble together these commands to solve other scenarios that can be decomposed into these.

  1. Why is Git so complex?
    Two reasons for it:
    a. Git is completely distributed - so each developer has a copy of the entire repository and is responsible for keeping this in sync with other repositories.
    b. As a consequence of #a, operations in Git do not easily map to operations in CVS or Perforce or other popular version control systems.

  2. How do I start on a new project?
    The most common way is to clone a remote repository. Git supports http, ssh, etc. You need a git URL for a remote git repository and a local directory in which you want to create the cloned workspace. You can then issue the following command:

    $ cd local_dir
    $ git clone git_url


    This remote repository can be any repository. Usually there is a designated central repository shared by all members of a development team. But it could as well be another clone of that repository hosted on a different machine.

    The remote repository from which the local repository is cloned is conventionally called origin.

    $ git config --global user.name "Your Name"
    $ git config --global user.email "your@email.com"
    $ git config --global core.editor "path-to-your-fav-editor"
    



  3. How do I checkout a file if I need to change it in my workspace?
    There is no concept of checking out a file. You cannot lock any files for edit. Just edit it in your workspace.



  4. How do I keep track of which files I've modified?
    In your workspace, run the following command.

    $ git status

    It will list files that have been modified, as well as new files that have been created in your workspace.



  5. What is the index and what is meant by staging files?
    When you add a new file, it is an untracked file. When you modify an existing file, it is a modified file. To indicate your intent of committing them, you need to stage them. You stage a file by running:

    $ git add file_or_dir

    The index is simply the set of changes you've staged. It is also called the staging area. If you've removed files, to stage the removals too, use the --all switch:
    $ git add --all file_or_dir
    



  6. How can I unstage a staged file?

    If you added a new file, you can unstage it thus:

    $ git rm --cached file_name

    You can unstage all files in the index by running:

    $ git reset HEAD

    To unset a specific modified file, run:

    $ git reset -- file_name



  7. How do I discard my changes to a modified file?

    By discarding, you jettison your changes and replace it with the latest committed copy:

    $ git checkout -- file_name

    However, if the file has already been staged, you will need to unstage it first before being able to discard it (see #6 above).



  8. How do I discard all my local changes?
    Run the following:
    $ git checkout -- .
    
    If some of your changes are staged, and some unstaged, the above will discard both. If you want to discard only your unstaged changes:
    $ git stash save --keep-index
    $ git stash drop
    
  9. How do I check-in a file I have modified?
    The term check-in is not used in git. Instead you do a two-step synchronization of your local repository with the remote repository.

    a. First you make sure that all files you have modified or created in your workspace are committed to your local repository. You do this thus:

    $ git add filename  # necessary if it's a new file
    $ git add dirname   # recursive


    You call git add on files that are newly added, as well as those that you've modified. These files are now said to be staged.

    b. Next, you commit all these files to your local repository using git commit.

    $ git commit -a -m "Commit message"

    The -a switch ensures that any modified files are automatically added without the need for calling git add explicitly. You still have to call git add on newly added files.

    Each commit is assigned a SHA1 hash (because in a distributed system like git, adding incremental version numbers, which requires global ordering, is very difficult).

    The symbolic name HEAD refers to the last commit. HEAD^ refers to the parent of head. HEAD~N refers to Nth commit before HEAD (not including HEAD).



  10. How do I check what files I committed locally?
    $ git log

    This command prints the list of commits in your local repository including their hash values and commit notes. If you want to see the specific files committed and their diffs with the origin, run the following command.

    $ git diff origin..HEAD

    You can also check what changed between two commits:

    $ git diff commit1-sha1 commit2-sha1


  11. How do I keep my repository in sync with the remote repository?
    $ git pull remote-branch local-branch

    This would often (but not always) take the form of:

    $ git pull origin master

    Read on for more details.



  12. I have commits in my local repository and there are new commits in the upstream branch that I don't have in my local repo? How do merge everything keeping my commits?
    The following would still work.
    $ git pull remote-branch local-branch

    Under the circumstances, this would create a merge commit - marking the re-convergence point of two divergent branches in the local and the remote.

    However, this is often not desirable in many repos which generally avoid merge commits because they represent non-change commits. Instead you could rebase your local commits on top of the latest from the remote branch as you pull it.
    $ git pull --rebase



  13. How do I check the diff for staged files? What about files from a specific commit?
    To check the diff for your staged changes, use:

    $ git diff --cached

    To check the diff for a specific commit, use:

    $ git show commit-sha1

  14. To check the changes in a particular file between two commits:
    $ git show commit1-sha1 commit2-sha2 -- path_to_file
    



  15. I realized that my commit went wrong. How do I fix it?
    Maybe you left something out that should have made it to your commit. Or you had to make a change to a file that was already in the commit and you want to include that change. Or perhaps it is just the message you wanted to change. Or all of these? Use "git commit --amend".

    $ git add new_files
    $ git commit -a --amend   # only adds files
    $ git commit --amend -m "message"   # only changes message
    $ git commit -a --amend -m "message"  # changes message, adds files

    Amending a commit replaces it with a new commit with a new SHA1 id. Thus amending should be reserved for private branches, or else any branches based off a commit that is replaced by an amend operation would be difficult to manage.



  16. Now that I have committed my changes to my local repository, how do I check them into the remote repository?
    You don't check-in. You sync with the remote repository using git push. The general command is:
    $ git push remote-repo local-branch



  17. How can I refer to additional remote repositories, pull from them, push to them?
    First you add a remote:
    $ git remote add <remote-name> https://github.com/another-user/repo.git
    
    Next you can fetch from the remote:
    $ git fetch <remote-name>
    
    This gets you the commits from this repo. Now you can refer to branches and commits from this remote repo locally - merge changes into yours, and push your branches to this remote.
    $ git merge <remote-name>/<branch-name<remote-name>
    
    $ git push <remote-name> <local-branch-name<
    
    You can also remove all references to commits and branches from this remote repository by deleting the remote itself.
    $ git remote remove <remote-name>
    



  18. How do I delete or rename a file that's already checked in?

    To delete a file:$ git rm path_to_file

    To rename a file:

    $ git mv path_to_file path_to_target
    $ git rm path_to_file1 path_to_file2 ... target_dir
    

    You need to then commit this change and push it upstream.

  19. How do I see the history of a moved file? Git log only shows the history after it was moved.

    Use the --follow option with git log.

    $ git log --follow -p path_to_file

  20. How do I remove some changes from my commit?

    Use the new git restore command that was introduced in git 2.23 if you have access to it:

    $ git restore --source <commit> --staged -- path_to_file

    The commit specified with --source could be the previous commit (HEAD^) or another commit. This is the commit from which the files named using path_to_file would be restored to HEAD. The removed changes would be moved to the index since we specied the --staged option. If we so specified (using --worktree), any changes already present in the index would be moved to the worktree.


  21. You keep mentioning "branch", but I don't know what a branch is.
    A branch is a fork of a source tree. When you clone a repository, your local repository has a single branch called master. You can then create more branches from it as follows:

    $ git branch new-branch-name

    This one creates a branch from the HEAD of your current, but you can also create branches from other branches, or from specific commits.

    $ git branch new-branch-name commit-ref

    The commit ref here can be the name of another branch (referring to the HEAD of that branch) or to the SHA1 of a specific branch.








  22. Why do I need branches?
    Short answer: if you have used Perforce: think of them as Perforce changelists with history.

    You need branches to isolate and streamline your feature development, as well as handle releases. You should develop features and commit them to branches.

    You can switch between multiple branches in your workspace. If you want to switch to branch mybranch, use git checkout.

    $ git checkout mybranch

    This will swap the current contents of your workspace with that of the mybranch branch.

    * If there are uncommitted changes in your workspace and you switch to a different branch using checkout, the switch will fail if it required clobbering your uncommitted changes.

    You can also check all the branches by simply using:

    $ git branch

    Your current branch will be marked with an asterisk.



  23. I have some uncommitted changes in my master branch workspace but I now want to put them in a different branch. How do I do it?
    Just create a branch and switch to it. You can do both in a single command:

    $ git checkout -b new-feature-branch



  24. How do I make sure that my branch is updated with changes from another branch?
    Let us suppose you're working on branch my-branch. You can sync your master with the origin:

    $ git pull origin master

    and then merge changes in your master into your my-branch:

    $ git checkout my-branch  # switch to target branch
    $ git merge master        # merge from source branch


    Or, if you want to merge the contents of my-branch into master, you could do:

    $ git checkout master   # switch to target branch
    $ git merge my-branch   # merge from source branch


    A better way of keeping your branch updated is to use rebasing (see below).



  25. How do I resolve conflicts in git?
    Use a tool like kdiff3 or meld. Download and configure it using git config, then invoke git mergetool.

    $ git config --global merge.tool kdiff3
    $ git config --global merge.tool.cmd '"C:\\Program Files\\KDiff3\\kdiff3.exe" $BASE $LOCAL $REMOTE -o $MERGED' $ git mergetool

    $ git mergetool runs the configured mergetool on every file with merge conflicts in the current workspace.



  26. I committed something to my master branch but now want those changes in a different branch but not in master. How do I do it?

    If those changes were not followed by other changes that you would like to retain in the master, then you can do this.

    a. Create branch from master.

    $ git checkout -b new-feature-branch

    b. Reset master to the last commit before your changes.

    $ git checkout master
    $ git reset --hard last-retained-commit-sha1


    Never run reset on a branch you share with other developers. Just like git commit --amend, git reset removes some commits and if such a removed commit is the baseline for some other branch, then that presents a difficult scenario to recover from.



  27. Can I push a branch to the remote repository?
    Yes. Use git push as show below:

    $ git push origin my-local-branch



  28. How can I delete the remote branch I pushed, but retain the local branch?
    Use the colon-prefixed branch name with git push.

    $ git push origin :my-local-branch


  29. How can I delete a local branch?
    $ git branch -d my-local-branch

    The above will delete the local branch only if all commits in the branch are also part of at least another branch locally known. Sometimes, this would require you to pull other remote branches which were not locally updated to ensure that those commits are visible in other branches locally, and then retry the command. If you don't care and just want to delete the branch anyhow, then use:

    $ git branch -D my-local-branch



  30. How do I modify older commits?
    If you can, don't. Work on a single commit per branch, keep amending as needed. Push upstream when you're done. If you still have to tinker with older commits as you sometimes need to do, read the answer to the question "What is the use of rebasing?" below.



  31. How do I trace which commit by which user changed a particular line in a file?
    $ git blame file_name

    It's usually a better idea to use the -w switch to ignore whitespace changes and -M to detect moved lines.
    $ git blame -M -w file_name



  32. How can I undo a commit?
    You can always undo manually and check-in. When it is the last commit, or you can tell that undoing an older commit will not cause conflicts with later commits, then you can use git revert.

    $ git revert HEAD
    $ git revert commit-sha1


    This creates a new commit (you don't need to separately call git commit after this), that undoes the previous commit.


  33. What is the use of rebasing?
    Rebasing changes the baseline commit of a branch. It is a cool way to merge branches which cleans up commit history nicely. Features are usually developed on separate branches. It is possible that one branch gets merged into the master while development on the other branch is still in progress. When you are done on the other branch and want to merge it back to the master too, you realize that its baseline commit is quite old. At this point you either merge the master into the branch (as suggested but discouraged in #19) and then merge it back to the master, or you rebase the branch. The former approach results in extraneous commits on account of the merge and could bury your own commits in a barrage of other commits that came in the merge. This is what rebase aims to avoid. To rebase a branch, you run the following command:

    $ git checkout branch-to-rebase
    $ git rebase commit-ref


    The commit-ref identifies the baseline commit you want to rebase your branch to. It could be a commit SHA1, or a branch name, etc. Frequently, you would want to rebase your branch to the head of a parent branch, so you would simply use the name of the parent branch for commit-ref. This effectively moves the baseline of your branch to the head of its parent branch (or to whatever commit you specified). A merge into the parent following this would be a fast-forward merge and produce a clean history.

    Rebase removes some commits and creates new ones in their lieu at different points in the branch. If there are sub-branches based on any of those commits that are removed in a rebase operation, then recovering those branches could be complex.



  34. I hurriedly reset my branch back by a couple of commits and now I realize I lost some important changes? Is there a way to retrieve the lost information?
    There actually is. We can find the SHA1 hash of any deleted commits only if it is from the last 30 days, by using:

    git reflog

    From its output, we can identify the SHA1 of a commit that has since been lost due to the reset. For a slightly more detailed output that helps you identify the context of each commit, you can run $ git log -g branch_name. We can then run git reset with that SHA1 as shown:

    $ git reset --hard lost-SHA1
    In case reflog didn't show you what you were looking for (an unlikely event), try your luck by running:
    $ git fsck --full
    

    Then go through the listed dangling commits, and blobs (for stashes), and use git show commit-sha1 to list contents.



  35. Could I encounter merge conflicts during rebasing and what do I do then?
    Yes you could and you have the option of either resolving the conflicts (see #20) and completing the rebase, or discarding the attempt to rebase. Once you have resolved the conflicts, you use the following command to continue the rebase:

    $ git rebase --continue

    Sometimes the conflict could be bad enough that you have to ditch the attempt to rebase. You could do that with:

    $ git rebase --abort



  36. Is there a way to save my uncommitted changes and work on another feature? OR What is git stash?
    Of course there is. $ git stash provides the easiest way of doing that. When you run git stash, it saves your uncommitted changes and lets you start on clean workspace that is in sync with your HEAD. Once you're done with these changes and have committed them, you could continue on your old changes with $ git stash pop. You can also use it as a "rebase" for your uncommitted changes, if there are new changes upstream that need to be pulled into your working branch:

    $ git stash
    $ git pull upstream-branch local-branch
    $ git stash apply
    $ git stash drop

    In case git stash apply encounters a conflict, you need to manually merge and then call git stash drop. Likewise, if you have to rebase your branch and have some unfinished work that you'd like to continue on after rebasing, use the following commands:

    $ git stash
    $ git rebase parent-branch
    $ git stash apply
    $ git stash drop
    

    You can use git stash pop in place of the command pair git stash apply/drop. You can discard your stashed changes without applying, by simply issuing git stash drop.



  37. I want to modify a commit (or maybe multiple commits) on a branch and they are not the last commit. What do I do?
    Use git rebase -i.
    $ git rebase -i      # Opens editor with list of commits
                         # Change pick to edit for commits you want to edit
    $ git commit --amend # at each commit, after adding files if any and editing message
    $ git rebase --continue # at each commit
    

  38. I have some modified files (or added files) that I am yet to stage. Trying to pull / merge on my branch fails. What do I do?
    One simple approach is to use stash.
    $ git stash   # this saves your work without committing it
    $ git pull origin master # or git merge some-branch
    $ git stash pop  # or git stash apply + git stash drop
    
  39. How do I see what all is stashed, peep into individual stashes, and apply stashes selectively?
    To list stashes:
    $ git stash list  # this saves your work without committing it
    stash@{0} ...
    stash@{1} ...
    ...
    

    Now you can check the content of a specific stash:
    $ git stash show -p stash@{1}
    

    You can also apply a specific stash:
    $ git stash pop stash@\{1\}  # escape the braces
    

    Why would you need to apply specific stashes? Stashes are added to a global list. Say you alternate between two branches A and B, and you stash your work on each branch into stashes s_A and s_B. If you now run:
    $ git stash pop
    

    it will always apply the last stash on whatever branch you're in. If the last stash you took was s_B on branch B, but you then call
    git stash pop
    on branch A, then it will apply s_B on branch A. This is not what you usually want. Instead, you should take care to list the stashes out, and apply the correct stash explicitly to the correct branch.
  40. I have some staged files that I'm yet to commit. At this stage I need to pull / merge / rebase on my branch, but it fails because of the staged changes what to do?
    Unstage your changes.

    $ git reset HEAD

    Then follow the above #32. However, if this was your first commit in your repo, the reset won't work as is. Instead you have to run the following command (but be very careful about running it under any other circumstances because you lose all previous commits):

    $ git update-ref -d HEAD

  41. I committed some files on my local branch but now I want to change them further and not push these commits upstream. Plus I may need to pull / merge / rebase. What do I do?
    Identify the last commit you want to retain - say last-good-commit-sha1. Run the following:
    $ git reset last-good-commit-sha1
    

    Your changes are now unstaged. In order to now do any merges on your branch, follow #31. If no merges are needed, just make the necessary changes and create a fresh commit. Also see, git commit --amend, git rebase and git rebase -i. Any command that changes commit history - whether git reset, git rebase or git commit --amend, should be issued only on local branches to edit commits that have not be pushed upstream.

    To push such commit history changes upstream, you have to use:

    $ git push -f ...
    

    In many repositories, git push -f ... is disabled and for good reason.

  42. I have some changes committed to one branch which I want to pull into another branch without pulling the whole branch?
    Identify the individual commits that you need to pull. Note their sha1 ids. Switch to the target branch and then use the git cherry-pick command:
    $ git checkout target-branch
    $ git cherry-pick -x <commit1-sha1>
    $ git cherry-pick -x <commit2-sha1>
    $ git cherry-pick -x <commit3-sha1>
    ...
    

    Your changes are now committed to your local branch. The commit ids are different from the ones you cherry-picked though. You now need to push your branch upstream. The -x option is vital if at a later point in time, your current branch and the branch you cherry-picked from need to be merged, etc. The -x switch avoids conflicts in such cases. If that's not your concern, leave out -x.

  43. How do I find what all I committed for this sprint?

    Use the following command:

    $ git log --author=your_user_name --since=2.weeks

    Adjust to your sprint length in weeks, days, even hours. Want to see all your diffs so far:

    $ git log --author=your_user_name --since=2.weeks | grep commit | sed -e 's/commit //' | xargs git show >changes.diff

    More generally, to check the difference between two specific points in time, you can use the following:
    $ git diff my-branch@{2015-09-30}..my-branch@{2015-09-28}

  44. How do I find out how many commits have gone into my current dev branch?
    $ git rev-list master.. --count
    

    That's assuming your parent branch is called master.

  45. How do I find out whether a branch contains a particular commit?

    You have to do it the other way round. You figure out the branches that contain a particular commit and check whether the specific branch you're interested in is one of them. Use the following command:

    $ git branch --contains <commit-id>
    

  46. How do I check the commit history of a specific file?
    $ git log -p 
    
  47. I want to move a file from one repository to another (say proj1 to proj2). Is there a way to do it that preserves history?

    Yes there is a way, but it won't happen with a single command and will require some manual intervention. So let us suppose you want to move file Foo.java from path src/bar under repository proj1 to path src/baz under repository proj2. Here are the commands you need to run in sequence:
    $ cd proj1
    $ git log --pretty=email --patch-with-stat --reverse – src/bar/Foo.java > ../Foo.patch
    $ vi ../Foo.patch
    ... Inside vi, run the following sed script in command mode
    :% s^/src/bar^/src/baz^g
    :wq
    $ cd ../proj2
    $ git am < ../Foo.patch
    

    The last command recreates the entire commit history in the target repository (proj2). Now in general, moving an entire repo into another as a subdirectory is much more common. This link here shows a nice way of doing it. Just remember to use --allow-unrelated-histories during git merge.

  48. How can I import one git repository as a subdirectory of another repository?

    You need to use the git subtree command as shown below:
    $ cd target-repo
    $ git subtree add --prefix=target/subdir https://source.repo/foo/bar.git 
    

    In the above example, enter the target repo directory, then import source.repo under the target/repo subdirectory of your target repo. You also specify the branch from which you pull. The directories are created as needed. This will add the entire history of commits from the specified branch in source.repo in your current repo, plus create a merge commit. You would then push your changes in the target repo upstream.

    Sometimes before merging, you may need to change the committer ids of all the previous commits in case your git administrator has disabled merging commits with a different committer. You can do this by changing the committer id for all the commits on a branch (note that the author for these commits would not need to be unchanged so git blame would correctly show the author, not you). The best way is to change this at source, i.e. in the source repository on the branch that you would then pull into the destination repo.
    $ git checkout target-branch
    $ git filter-branch --env-filter '                                    
    export GIT_COMMITTER_NAME="i.am"
    export GIT_COMMITTER_EMAIL="i.am@mydomain.com"
    ' --tag-name-filter cat -- --branches --tags
  49. Something seems wrong with my working tree / local repository. Pulls / merges don't seem to be working and it is showing unstaged files I haven't even changed. What do I do?
    To start with, run:
    $ git gc
    

    Make sure no other programs are running git commands on the same repository in the background. This means closing Eclipse, SourceTree, any program that might be running some git commands in the background, etc. You should also periodically clean up your repository of dangling commits, etc. Run the following commands:
    $ git fsck
    $ git prune
    
  50. My remote url has changed (i.e. my origin has changed), do I have to clone the whole repository again from the new URL ?
    No. You can easily change the origin's URL or the remote URL of your local repository. You can do using git remote command.
    $ git remote set-url origin 
    
  51. How do I find out who committed how many files?
    You can run the following command:
    $ git shortlog -s -n
    

    But do note that the same author could show up with multiple names depending on whether their user name changed or not.

  52. $ git log -n 10 --author=amukher1








  53. How do I create a local branch and push from there into a remote branch?
    Do the following:
    $ git fetch --all
    $ git fetch -t
    $ git branch <local_branch_name> remotes/origin/<remote_branch_name>
    

    The last step creates a tracking branch.







  54. How do I clone a single remote branch of a repository?
    Do the following:
    $ git clone <url> --branch <branch-name>
    

    The last step creates a tracking branch.







  55. How do I update a local branch from its tracking branch?
    Simply type:
    $ git pull
    
  56. How can I add an existing repository to github?
    Create a new repository on github using the web interface. Let's say its call newrepo. Now go to your local repository root dir and run the following commands.
    $ git init
    $ git remote add origin https://github.com/username/newrepo
    $ git push -u origin master
    








  57. I stashed my work but have now lost it. How can I recover it?
    Run the following command:
    $ for ref in `git fsck --unreachable | grep commit | cut -d' ' -f3`; do git show --summary $ref; done | less
    

    This will neatly list the unreachable commits (including stash commits). Identify which one you need and do a:
    $ git show <sha1>
    

  58. Is there an easy way to list just file names and not their contents in a commit or diff?
    Yes. When in doubt, try the --name-only option on a git command. It will usually suffice:
    $ git show --name-only
    

    This will list the files in the last commit. You can easily extend that to diff:
    $ git diff --name-only
    

  59. How can I see the state of a file at a particular commit in the past?
    This way:
    $ git show <commit-sha1>:<file-path>
    

  60. How can I list all the remote branches I have created (so that I can clean them up later)?
    This way:
    $ git for-each-ref --format='%(committerdate) %09 %(authorname) %09 %(refname)' | sort -k5n -k2M -k3n -k4n| grep | awk '{print $8}'| grep origin| sed 's/refs\/remotes\///'
    



  61. How can I squash all my commits before merging into another branch?
    Let's say you have a branch called fix-NNN and you want to squash merge it in feature-NNN:
    $ git checkout feature-NNN
    $ git merge --squash fix-NNN    # squashes and adds to the index
    $ git commit
    Things work best and cleanest if fix-NNN was branched off feature-NNN. So how does this apply to the common case where we want to squash everything a working branch and raise a PR?
    $ BASE_COMMIT=`git merge-base main dev-branch`
    $ git checkout -b squashed-dev $BASE_COMMIT
    $ git merge --squash dev-branch     # squashes commits from dev-branch
    $ git commit                        # to squashed-dev
    Another alternative (which also works on older git) is the following:
    $ git show-ref fix-NNN
    sha1... refs/heads/fix-NNN   # make a note of the SHA1 -> head-of-fix-NNN
    $ git merge feature-NNN
    $ git stash    # stash everything in progress
    $ git reset head-of-fix-NNN
    $ git add --all
    $ git commit     # effectively squashes
    


  62. How can I get the last common commit between two branches?
    You're in luck - git has a specific command for the purpose:
    $ git merge-base branch1 branch2
    

  63. How can I find the commits in one branch not present in the other?
    Simple way to find commits in branch b1 not present in b2:
    $ git log b2..b1
    

    If we want to find commits from b1 and b2, missing on multiple branches b3 and b4:
    $ git log --no-merges b1 b2 ^b3 ^b4  # use ^^ instead of ^ on Windows cmd shell
    

  64. I squash-merged my dev branch into my github feature branch. But I continued adding more commits after that on my dev branch. How can I now incrementally merge / rebase my dev branch?
    Let's suppose that in your dev branch, the commits are a1, a2, a3, a4, a5. You squash merged till a2 so a1 and a2 are no longer available in your feature branch. If you now tried rebasing your dev branch on the feature branch you will get conflicts due to the changes in a1, a2 being present in the feature branch with a different commit. What you should be able to do is rebase from a3 onwards on to the head of the feature branch. You do this with:

    $ git rebase --onto feature-branch a2
    
    Note that a2 is the old parent of the commit which is being reparented, and feature-branch is the new parent.


  65. What are git submodules, subtrees, and worktrees good for?
    Git submodules are a way to include the contents of an entire directory as a subdirectory of another. The host repo's history remains distinct from that of the submodule's and the host repo maintains a reference to specific commits inside the submodule - and this must be explicitly updated as needed.
    Git subtrees are slightly different from submodules. They allow including all or part of another repository inside a host repository but their distinct identity is lost. The pulled content becomes a series of commits in the host repo, or a single commit if the squash option is used. These commits can be pushed, or discarded after bulds, but they don't affect the remote repo.
    Worktrees are unrelated to these but are a useful client-side tool. A worktree represents a directory - typically a subdirectory of the repo directory - that contains a specific branch of the repo checked out. One can simultaneously work on multiple branches by creating worktrees associated with them. While these are useful, they can result in a local workflow of switching between multiple worktrees, which can be cumbersome.
    $ git rebase --onto feature-branch a2
    
    Note that a2 is the old parent of the commit which is being reparented, and feature-branch is the new parent.


  66. Additional references

  1. Git from the bottom-up: https://jwiegley.github.io/git-from-the-bottom-up
  2. Git revision selection: http://git-scm.com/book/en/v2/Git-Tools-Revision-Selection
  3. Git-aware bash prompts: https://github.com/jimeh/git-aware-prompt
  4. A Hacker's Guide to Git: https://wildlyinaccurate.com/a-hackers-guide-to-git/

Read more!