Skip to content
This repository has been archived by the owner on Feb 13, 2023. It is now read-only.
oguzhanyetimoglu edited this page Feb 21, 2018 · 12 revisions

1. What is Git?

Git is a version control system (VCS) for tracking changes in computer files and coordinating work on those files among multiple people. It is primarily used for source code management in software development, but it can be used to keep track of changes in any set of files. As a distributed revision control system it is aimed at speed, data integrity, and support for distributed, non-linear workflows.

Git was created by Linus Torvalds in 2005 for development of the Linux kernel, with other kernel developers contributing to its initial development.Its current maintainer since 2005 is Junio Hamano.


2. Git Basics

As you learn Git, try to clear your mind of the things you may know about other VCSs, such as CVS, Subversion or Perforce — doing so will help you avoid subtle confusion when using the tool. Even though Git’s user interface is fairly similar to these other VCSs, Git stores and thinks about information in a very different way, and understanding these differences will help you avoid becoming confused while using it.

2.1. Snapshots, Not Differences

The major difference between Git and any other VCS (Subversion and friends included) is the way Git thinks about its data. Conceptually, most other systems store information as a list of file-based changes. These other systems (CVS, Subversion, Perforce, Bazaar, and so on) think of the information they store as a set of files and the changes made to each file over time (this is commonly described as delta-based version control).

Figure 1

Figure 1. Storing data as changes to a base version of each file.

Git doesn’t think of or store its data this way. Instead, Git thinks of its data more like a series of snapshots of a miniature filesystem. With Git, every time you commit, or save the state of your project, Git basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot. To be efficient, if files have not changed, Git doesn’t store the file again, just a link to the previous identical file it has already stored. Git thinks about its data more like a stream of snapshots.

Figure 2

Figure 2. Storing data as snapshots of the project over time.

This is an important distinction between Git and nearly all other VCSs. It makes Git reconsider almost every aspect of version control that most other systems copied from the previous generation. This makes Git more like a mini filesystem with some incredibly powerful tools built on top of it, rather than simply a VCS.

2.2. Nearly Every Operation Is Local

Most operations in Git need only local files and resources to operate — generally no information is needed from another computer on your network. If you’re used to a CVCS where most operations have that network latency overhead, this aspect of Git will make you think that the gods of speed have blessed Git with unworldly powers. Because you have the entire history of the project right there on your local disk, most operations seem almost instantaneous.

For example, to browse the history of the project, Git doesn’t need to go out to the server to get the history and display it for you — it simply reads it directly from your local database. This means you see the project history almost instantly. If you want to see the changes introduced between the current version of a file and the file a month ago, Git can look up the file a month ago and do a local difference calculation, instead of having to either ask a remote server to do it or pull an older version of the file from the remote server to do it locally.

This also means that there is very little you can’t do if you’re offline or off VPN. If you get on an airplane or a train and want to do a little work, you can commit happily (to your local copy, remember?) until you get to a network connection to upload. If you go home and can’t get your VPN client working properly, you can still work. In many other systems, doing so is either impossible or painful. In Perforce, for example, you can’t do much when you aren’t connected to the server; and in Subversion and CVS, you can edit files, but you can’t commit changes to your database (because your database is offline). This may not seem like a huge deal, but you may be surprised what a big difference it can make.

2.3. Git Has Integrity

Everything in Git is check-summed before it is stored and is then referred to by that checksum. This means it’s impossible to change the contents of any file or directory without Git knowing about it. This functionality is built into Git at the lowest levels and is integral to its philosophy. You can’t lose information in transit or get file corruption without Git being able to detect it.

The mechanism that Git uses for this checksumming is called a SHA-1 hash. This is a 40-character string composed of hexadecimal characters (0–9 and a–f) and calculated based on the contents of a file or directory structure in Git. A SHA-1 hash looks something like this:

24b9da6552252987aa493b52f8696cd6d3b00373

You will see these hash values all over the place in Git because it uses them so much. In fact, Git stores everything in its database not by file name but by the hash value of its contents.

2.4. Git Generally Only Adds Data

When you do actions in Git, nearly all of them only add data to the Git database. It is hard to get the system to do anything that is not undoable or to make it erase data in any way. As with any VCS, you can lose or mess up changes you haven’t committed yet, but after you commit a snapshot into Git, it is very difficult to lose, especially if you regularly push your database to another repository.

This makes using Git a joy because we know we can experiment without the danger of severely screwing things up.

2.5. The Three States

Pay attention now — here is the main thing to remember about Git if you want the rest of your learning process to go smoothly. Git has three main states that your files can reside in: committed, modified, and staged:

  • Committed means that the data is safely stored in your local database.
  • Modified means that you have changed the file but have not committed it to your database yet.
  • Staged means that you have marked a modified file in its current version to go into your next commit snapshot.

This leads us to the three main sections of a Git project: the Git directory, the working tree, and the staging area.

Figure 3

Figure 3. Working tree, staging area, and Git directory.

The Git directory is where Git stores the metadata and object database for your project. This is the most important part of Git, and it is what is copied when you clone a repository from another computer.

The working tree is a single checkout of one version of the project. These files are pulled out of the compressed database in the Git directory and placed on disk for you to use or modify.

The staging area is a file, generally contained in your Git directory, that stores information about what will go into your next commit. Its technical name in Git parlance is the “index”, but the phrase “staging area” works just as well.

The basic Git workflow goes something like this:

  1. You modify files in your working tree.
  2. You selectively stage just those changes you want to be part of your next commit, which adds only those changes to the staging area.
  3. You do a commit, which takes the files as they are in the staging area and stores that snapshot permanently to your Git directory.

f a particular version of a file is in the Git directory, it’s considered committed. If it has been modified and was added to the staging area, it is staged. And if it was changed since it was checked out but has not been staged, it is modified.

2.6. Some Advantages

  • Git is a fast and modern implementation of version control.
  • Git provides a history of content changes.
  • Git facilitates collaborative changes to files.
  • Git is easy to use for any type of knowledge worker.

3. Basic Git commands

Git task Notes Git commands
Tell Git who you are Configure the author name and email address to be used with your commits. Note that Git strips some characters (for example trailing periods) from user.name. git config --global user.name "Sam Smith" or git config --global user.email sam@example.com
Create a new local repository git init
Check out a repository Create a working copy of a local repository: git clone /path/to/repository
Check out a repository For a remote server, use: git clone username@host:/path/to/repository
Add files Add one or more files to staging (index): git add <filename> or git add *
Commit Commit changes to head (but not yet to the remote repository): git commit -m "Commit message"
Commit Commit any files you've added with git add, and also commit any files you've changed since then: git commit -a
Push Send changes to the master branch of your remote repository: git push origin master
Status List the files you've changed and those you still need to add or commit: git status
Connect to a remote repository If you haven't connected your local repository to a remote server, add the server to be able to push to it: git remote add origin <server>
Connect to a remote repository List all currently configured remote repositories: git remote -v
Branches Create a new branch and switch to it: git checkout -b <branchname>
Branches Switch from one branch to another: git checkout <branchname>
Branches List all the branches in your repo, and also tell you what branch you're currently in: git branch
Branches Delete the feature branch: git branch -d <branchname>
Branches Push the branch to your remote repository, so others can use it: git push origin <branchname>
Branches Push all branches to your remote repository: git push --all origin
Branches Delete a branch on your remote repository: git push origin :<branchname>
Update from the remote repository Fetch and merge changes on the remote server to your working directory: git pull
Update from the remote repository To merge a different branch into your active branch: git merge
Update from the remote repository View all the merge conflicts: git diff
Update from the remote repository View the conflicts against the base file: git diff --base <filename>
Update from the remote repository Preview changes, before merging: git diff <sourcebranch> <targetbranch>
Update from the remote repository After you have manually resolved any conflicts, you mark the changed file: git add <filename>
Tags You can use tagging to mark a significant changeset, such as a release: git tag 1.0.0 <commitID>
Tags Commit Id is the leading characters of the changeset ID, up to 10, but must be unique. Get the ID using: git log
Tags Push all tags to remote repository: git push --tags origin
Undo local changes If you mess up, you can replace the changes in your working tree with the last content in head: Changes already added to the index, as well as new files, will be kept. git checkout -- <filename>
Undo local changes Instead, to drop all your local changes and commits, fetch the latest history from the server and point your local master branch at it, do this: git fetch origin or git reset --hard origin/master
Search Search the working directory for foo(): git grep "foo()"

4. References

Clone this wiki locally