-
Notifications
You must be signed in to change notification settings - Fork 697
Source Guide
The current information (which dates to 2018) is substantially accurate for the core "Cabal" repository (the core package builder, with a Setup.hs interface, as opposed to the manifest parser and the CLI). And the Cabal subrepository has grown from around 24,000 lines of code to 33,000 SLOC. However, the wider repository is currently around 133,000 lines of code and growing, with cabal-install (the CLI tool) being the largest package.
For details on the package at large, the reader is better off gleaning information from README.md.
For details on the code recommended codestyle, check out CONTRIBUTING.md.
While this is still a mostly good description of the Cabal (core) subrepository, the subrepository has been massively refactored since then, and quite a few of the links no longer work.
Imported from Trac wiki, in the process of being updated to match the current reality. Nothing below should be substantially wrong, but it's still somewhat incomplete.
On first look the Cabal code seems large and intimidating. This page is intended to give you a head start in understanding it.
All the Cabal modules live under Distribution.*
The modules can be roughly divided into two groups:
-
The declarative modules: They are mostly concerned with data structures like package descriptions. These modules live under
Distribution.*
. Much of the code in these modules are utility functions for handling the data types and also functions for parsing and showing them. -
The active modules: They are concerned with actually doing things like configuring, building and installing packages. These modules live under
Distribution.Simple.*
.
According to SLOCCount Cabal is currently about 23,500 lines of code. This breaks down as about 7,000 lines for the declarative part and about 16,500 for the active part. Most modules are less than a few hundred lines, though there are a couple monsters nearer 1,000 lines.
Cabal is 100% Haskell. It is written in Haskell2010 with a fair number of extensions. Formerly, it was written in fairly pure Haskell98 and avoided dependencies on non-core packages, but this is no longer true, with dependencies on transformers
, mtl
, parsec
, and text
.
- Distribution/GetOpt.hs (source) (no docs - hidden module): This should live under Compat/ it's just a bundled version of the standard GetOpt. Not very interesting.
-
Distribution/Version.hs (source) (docs): exports the
Version
type along with a parser and pretty printer. A version is something like "1.3.3". It also defines aVersionRange
data type. Version ranges are like ">= 1.2 && < 2". -
Distribution/Package.hs (source) (docs): defines a package identifier along with a parser and pretty printer for it. It also defines
PackageIdentifier
s andDependency
s. Identifiers consist of a name and an exact version and dependencies consist of a name and a version range. -
Distribution/Verbosity.hs (source) (docs): a simple
Verbosity
type with associated utilities. There are 4 standard verbosity levels fromSilent
,Normal
,Verbose
up toDeafening
, with further control in private flags. This is used for deciding what logging messages to print in the active parts. -
Distribution/Compiler.hs (source) (docs): This has an enumeration of the various compilers that Cabal knows about. It also specifies the default compiler. Sadly you'll often see code that does case analysis on this compiler flavour enumeration like:
case compilerFlavor comp of GHC -> GHC.getInstalledPackages verbosity packageDb progconf JHC -> JHC.getInstalledPackages verbosity packageDb progconf
Obviously it would be better to use the proper
Compiler
abstraction because that would keep all the compiler-specific code together.Unfortunately we cannot make this change yet without breaking the'for the moment' has been twelve years. It's safe to say that this probably isn't happening.UserHooks
api, which would break all customSetup
.hs files, so for the moment we just have to live with this deficiency. If you're interested, see issue #57. -
Distribution/System.hs (source) (docs): Cabal often needs to do slightly different things on specific platforms. You probably know about the
System.Info.os :: String
however using that is very inconvenient because it is a string and different Haskell implementations do not agree on using the same strings for the same platforms! (In particular see the controversy over "windows" vs "ming32"). So to make it more consistent and easy to use we have anOS
enumeration. This also performs a similar duty for the CPU architecture of the system. -
Distribution/License.hs (source) (docs): The
.cabal
file allows you to specify a license file. Of course you can use any license you like but people often pick common open source licenses and it's useful if we can automatically recognise that (eg so we can display it on the hackage web pages). So you can also specify the license itself in the.cabal
file from a short enumeration defined in this module. It includesGPL
,LGPL
andBSD3
licenses. This works with a subset ofSPDX.License
s.
-
Distribution/ParseUtils.hs (source) (no docs - hidden module): The
.cabal
file format is not trivial, especially with the introduction of configurations and the section syntax that goes with that. This module has a bunch of parsing functions that is used by the.cabal
parser and a couple others. It has the parsing framework code and also little parsers for many of the formats we get in various.cabal
file fields, like module names, comma separated lists etc. -
Distribution/PackageDescription.hs (source) (docs): This defines the data structure for the
.cabal
file format. There are several parts to this structure. It has top level info and thenLibrary
andExecutable
sections each of which have associatedBuildInfo
data that's used to build the library or exe. To further complicate things there is both aPackageDescription
and aGenericPackageDescription
. This distinction relates to [Cabal configurations](Cabal configurations). When we initially read a.cabal
file we get aGenericPackageDescription
which has all the conditional sections. Before actually building a package we have to decide on each conditional. Once we've done that we get aPackageDescription
. It was done this way initially to avoid breaking too much stuff when the feature was introduced. It could probably do with being rationalised at some point to make it simpler.This has been split apart into several more files since this was last updated, but that isn't crucial.
-
Distribution/PackageDescription/Configuration.hs (source) (docs): This is about the [Cabal configurations](Cabal configurations) feature. It exports
finalizePackageDescription
andflattenPackageDescription
which are functions for convertingGenericPackageDescriptions
down toPackageDescriptions
. It has code for working with the tree of conditions and resolving or flattening conditions. -
Distribution/PackageDescription/Parse.hs (source) (docs): This defined parsers and partial pretty printers for the
.cabal
format. Some of the complexity in this module is due to the fact that we have to be backwards compatible with old.cabal
files, so there's code to translate into the newer structure. -
Distribution/PackageDescription/Check.hs (source) (docs): This has code for checking for various problems in packages. There is one set of checks that just looks at a
PackageDescription
in isolation and another set of checks that also looks at files in the package. Some of the checks are basic sanity checks, others are portability standards that we'd like to encourage. There is aPackageCheck
type that distinguishes the different kinds of check so we can see which ones are appropriate to report in different situations. This code gets uses when configuring a package when we consider only basic problems. The higher standard is uses when when preparing a source tarball and by hackage when uploading new packages. The reason for this is that we want to hold packages that are expected to be distributed to a higher standard than packages that are only ever expected to be used on the author's own environment. -
Distribution/InstalledPackageInfo.hs (source) (docs): The
.cabal
file format is for describing a package that is not yet installed. It has a lot of flexibility like conditionals and dependency ranges. As such that format is not at all suitable for describing a package that has already been built and installed. By the time we get to that stage we have resolved all conditionals and resolved dependency version constraints to exact versions of dependent packages. So this module defines theInstalledPackageInfo
data structure that contains all the info we keep about an installed package. There is a parser and pretty printer. The textual format is rather simpler than the.cabal
format, there are no sections for example. This is the format thatghc-pkg
understands.
-
Distribution/Simple/Program.hs (source) (docs): This provides an abstraction which deals with configuring and running programs. A
Program
is a static notion of a known program. AConfiguredProgram
is aProgram
that has been found on the current machine and is ready to be run (possibly with some user-supplied default args). Configuring a program involves finding its location and if necessary finding its version. There is also aProgramConfiguration
type which holds configured and not-yet configured programs. It is the parameter to lots of actions elsewhere in Cabal that need to look up and run programs. If we had a Cabal monad, theProgramConfiguration
would probably be a reader or state component of it.The module also defines all the known built-in
Programs
and thedefaultProgramConfiguration
which contains them all. -
Distribution/Simple/Command.hs (source) (docs): This is to do with command line handling. The Cabal command line is organised into a number of named sub-commands (much like darcs). The
Command
abstraction represents one of these sub-commands, with a name, description, a set of flags.Command
s can be associated with actions and run. It handles some common stuff automatically, like the--
help and command line completion flags. It is designed to allow other tools make derived commands. This feature is used heavily in cabal-install. -
Distribution/Simple/InstallDirs.hs (source) (docs): This manages everything to do with where files get installed (though does not get involved with actually doing any installation). It provides an
InstallDirs
type which is a set of directories for where to install things. It also handles the fact that we use templates in these install dirs. For example most install dirs are relative to some$prefix
and by changing the prefix all other dirs still end up changed appropriately. So it provides aPathTemplate
type and functions for substituting for these templates. -
Distribution/Simple/Compiler.hs (source) (docs): This should be a much more sophisticated abstraction than it is. Currently it's just a bit of data about the compiler, like it's flavour and name and version. The reason it's just data is because currently it has to be in
Read
andShow
so it can be saved along with theLocalBuildInfo
. The only interesting bit of info it contains is a mapping between language extensions and compiler command line flags. This module also defines aPackageDB
type which is used to refer to package databases. Most compilers only know about a single global package collection but GHC has a global and per-user one and it lets you create arbitrary other package databases. We do not yet support this latter feature very much. -
Distribution/Simple/PreProcess.hs (source) (docs): This defines a
PreProcessor
abstraction which represents a pre-processor that can transform one kind of file into another. There is also aPPSuffixHandler
which is a combination of a file extension and a function for configuring aPreProcessor
. It defines a bunch of known built-in preprocessors like cpp, cpphs, c2hs, hsc2hs, happy, alex etc and lists them inknownSuffixHandlers
. On top of this it provides a function for actually preprocessing some sources given a bunch of known suffix handlers. This module is not as good as it could be, it could really do with a rewrite to address some of the problems we have with pre-processors. -
Distribution/Simple/Utils.hs (source) (docs): A large and somewhat miscellaneous collection of utility functions used throughout the rest of the Cabal lib and in other tools that use the Cabal lib like cabal-install. It has a very simple set of logging actions. It has low level functions for running programs, a bunch of wrappers for various directory and file functions that do extra logging.
-
Distribution/Simple/LocalBuildInfo.hs (source) (docs): Once a package has been configured we have resolved conditionals and dependencies, configured the compiler and other needed external programs. The
LocalBuildInfo
is used to hold all this information. It holds the install dirs, the compiler, the exact package dependencies, the configured programs, the package database to use and a bunch of miscellaneous configure flags. It gets saved and reloaded from a file (dist/setup-config
). It gets passed in to very many subsequent build actions.
-
Distribution/Simple/Configure.hs (source) (docs): This deals with the configure phase. It provides the
configure
action which is given the package description and configure flags. It then tries to:- configure the compiler
- resolves any conditionals in the package description
- resolve the package dependencies
- check if all the extensions used by this package are supported by the compiler
- check that all the build tools are available (including version checks if appropriate)
- checks for any required pkg-config packages (updating the
BuildInfo
with the results)
Then based on all this it saves the info in the
LocalBuildInfo
and writes it out to a file. It also displays various details to the user, the amount of information displayed depending on the verbosity level. -
Distribution/Simple/Build.hs (source) (docs): This is the entry point to actually building the modules in a package. It doesn't actually do much itself, most of the work is delegated to compiler-specific actions. It does do some non-compiler specific bits like running pre-processors.
-
Distribution/Simple/Build/PathsModule.hs (source) (docs): Generates the
Paths_pkgname
module. This is a module that Cabal generates for the benefit of packages. It enables them to find their version number and find any installed data files at runtime. This code should probably be split off into another module. -
Distribution/Simple/Install.hs (source) (docs): This is the entry point into installing a built package. It does the generic bits and then calls compiler-specific functions to do the rest.
-
Distribution/Simple/Haddock.hs (source) (docs): This module deals with the haddock and hscolour commands. Sadly this is a rather complicated module. It has to call ghc-pkg to find the locations of documentation for dependent packages, so it can create links. The hscolour support allows generating html versions of the original source, with coloured syntax highlighting.
-
Distribution/Simple/Register.hs (source) (docs): This module deals with registering and unregistering packages. There are a couple ways it can do this, one is to do it directly. Another is to generate a script that can be run later to do it. The idea here being that the user is shielded from the details of what command to use for package registration for a particular compiler. In practice this aspect was not especially popular so we also provide a way to simply generate the package registration file which then must be manually passed to ghc-pkg. It is possible to generate registration information for where the package is to be installed, or alternatively to register the package inplace in the build tree. The latter is occasionally handy, and will become more important when we try to build multi-package systems. This module does not delegate anything to the per-compiler modules but just mixes it all in in this module, which is rather unsatisfactory. The script generation and the unregister feature are not well used or tested.
-
Distribution/Simple/SrcDist.hs (source) (docs): This handles the
sdist
command. The module exports ansdist
action but also some of the phases that make it up so that other tools can use just the bits they need. In particular the preparation of the tree of files to go into the source tarball is separated from actually building the source tarball. The sdist action also does some distribution QA checks.
-
Distribution/Simple/GHC.hs (source) (docs): This is a fairly large module. It contains most of the GHC-specific code for configuring, building and installing packages. It also exports a function for finding out what packages are already installed. Configuring involves finding the ghc and ghc-pkg programs, finding what language extensions this version of ghc supports and returning a
Compiler
value.getInstalledPackages
involves calling the ghc-pkg program to find out what packages are installed. Building is somewhat complex as there is quite a bit of information to take into account. We have to build libs and programs, possibly for profiling and shared libs. We have to support building libraries that will be usable by GHCi and also ghc's-split-objs
feature. We have to compile any C files using ghc. Linking, especially forsplit-objs
is remarkably complex, partly because there tend to be 1,000's of .o files and this can often be more than we can pass to the ld or ar programs in one go. There is also some code for generatingMakefile
s but the less said about that the better. Installing for libs and exes involves finding the right files and copying them to the right places. One of the more tricky things about this module is remembering the layout of files in the build directory (which is not explicitly documented) and thus what search dirs are used for various kinds of files.
-
Distribution/Simple/UserHooks.hs (source) (docs): This defines the API that
Setup.hs
scripts can use to customise the way the build works. This module just defines theUserHooks
type. The predefined sets of hooks that implement theSimple
,Make
andConfigure
build systems are defined inDistribution.Simple
. TheUserHooks
is a big record of functions. There are 3 for each action, a pre, post and the action itself. There are few other miscellaneous hooks, ones to extend the set of programs and preprocessors and one to override the function used to read the.cabal
file. This hooks type is widely agreed to not be the right solution. Partly this is because changes to it usually break customSetup.hs
files and yet many internal code changes do require changes to the hooks. For example we cannot pass any extra parameters to most of the functions that implement the various phases because it would involve changing the types of the corresponding hook. At some point it will have to be replaced. -
Distribution/Simple/Setup.hs (source) (docs): This is a big module, but not very complicated. The code is very regular and repetitive. It defines the command line interface for all the Cabal commands. For each command (like
configure
,build
etc) it defines a type that holds all the flags, the default set of flags and aCommand
that maps command line flags to and from the corresponding flags type. All the flags types are instances ofMonoid
, see http://www.haskell.org/pipermail/cabal-devel/2007-December/001509.html for an explanation. The types defined here get used in the front end and especially incabal-install
which has to do quite a bit of manipulating sets of command line flags. This is actually relatively nice, it works quite well. The main change it needs is to unify it with the code for managing sets of fields that can be read and written from files. This would allow us to save configure flags in config files. -
Distribution/Simple.hs (source) (docs): This is the command line front end to the
Simple
build system. The original idea was that there could be different build systems that all presented the same compatible command line interfaces. There is still aMake
system (see below) but in practice no packages use it. This module exports the main functions thatSetup.hs
scripts use. It re-exports theUserHooks
type, the standard entry points likedefaultMain
anddefaultMainWithHooks
and the predefined sets ofUserHooks
that customSetup.hs
scripts can extend to add their own behaviour. -
Distribution/Make.hs (source) (docs): This is an alternative build system that delegates everything to the
make
program. All the commands just end up calling make with appropriate arguments. The intention was to allow preexisting packages that used makefiles to be wrapped into Cabal packages. In practice essentially all such packages were converted over to the Simple build system instead. Consequently this module is probably not used much and it certainly only sees cursory maintenance and no testing. Perhaps at some point we should stop pretending that it works.