Packages installed on a Linux system can depend on other packages, for example an application depends on shared libraries that it uses. This means that the “depends on” relationships defines a graph over packages.

Visualizing the “depends on” relationship allows understand at a glance what are the core packages and why each package was installed; so let’s explore de dependencies of packets installed on a Debian-based Linux installation.

Dependencies of a package can be listed using apt-cache depends <packagename>. For example, the list of dependencies of vim are:

vim
  Depends: vim-common
  Depends: vim-runtime
  Depends: libacl1
  Depends: libc6
  Depends: libgpm2
  Depends: libpython2.7
  Depends: libselinux1
  Depends: libtinfo5
  Suggests: <ctags>
    exuberant-ctags:i386
    exuberant-ctags
  Suggests: vim-doc
  Suggests: vim-scripts
  Conflicts: vim:i386

So what we want to do is:

  • list all installed packages;
  • list dependencies for each installed package;
  • put dependencies in some format that allows easy visualization.

Listing installed packages

Listing installed packages is easily done with dpkg:

dpkg --get-selections | grep -v 'deinstall' | cut -f1 | cut -d: -f1

Let’s explain this command:

  1. dpkg --get-selections: list all packages;
  2. grep -v 'deinstall': remove all lines that contain the word “deinstall”, thus only packages whose selection state is “install” will be kept;
  3. cut -f1 | cut -d: -f1

Now we want to get the list dependencies for each of these packages.

Listing dependencies

apt-cache allows to easily list dependencies of a package, given its name. When parsing its output, we should keep in mind that alternative dependencies start with a pipe (“|“) and virtual packages are shown within angle brackets (<>).

Assuming that the variable $pkgname contains the name of a valid package, we can use the following command to list the packages it depends on:

apt-cache depends "$pkgname" | grep 'Depends: ' | grep -v '<' | cut -d':' -f2- | tr -d ' '

What it does is:

  1. apt-cache depends "$pkgname": list all dependencies for the $pkgname package;
  2. grep 'Depends:': keep only lines containing the "Depends" word (i.e., "Depends" and "PreDepends");
  3. grep -v "<": remove references to virtual packages;
  4. cut -d':' -f2-: keep only package names;
  5. tr -d ' ': trim possible whitespaces.

Let's put all in a bash script:

#!/bin/sh

tempf=$(tempfile)
edgesf=$(tempfile)
nodesf=$(tempfile)
dpkg --get-selections | grep -v 'deinstall' | cut -f1 | cut -d: -f1 > "$tempf"
while read pkgname
do
    apt-cache depends "$pkgname" | grep 'Depends: ' | grep -v '<' | cut -d':' -f2- | tr -d ' ' | ( while read dependsf; do
        echo "$pkgname depends on $dependsf"
    done)
done < "$tempf"
rm "$tempf"
rm "$edgesf"
rm "$nodesf"

Visualize dependencies

I choose Gephi to visualize the dependency graph.

A possible alternative would be dot from the GraphViz suite. However, on my desktop machine there are about 2.500 installed packages and more than 13.000 dependencies: this graph is big enough to make very hard for dot to find a reasonable layout for all the graph nodes and output an image that can be actually visualized.

Gephi supports several graph definition formats, including GEXF. GEXF is an XML-based format and it is very simple to adapt our script so that it outputs dependencies in the GEXF format:

#!/bin/sh

tempf=$(tempfile)
edgesf=$(tempfile)
nodesf=$(tempfile)

# write list of installed packages in $tempf file
dpkg --get-selections | grep -v 'deinstall' | cut -f1 | cut -d: -f1 > "$tempf"

# output GEXF header 
echo '<?xml version="1.0" encoding="UTF-8"?>'
echo '<gexf xmlns="http://www.gexf.net/1.2draft" version="1.2">'
echo '    <graph mode="static" defaultedgetype="directed">'

# read all packages from $tempf
while read pkgname
do
    # create a node declaration and write it to $nodesf
    echo "            <node id=\"$pkgname\" label=\"$pkgname\"/>" >> "$nodesf"

    # read all dependencies, create node definitions for each package and add an edge
    apt-cache depends "$pkgname" | grep 'Depends: ' | grep -v '<' | cut -d':' -f2- | tr -d ' ' | ( while read dependsfrom ; do
        echo "            <node id=\"$dependsfrom\" label=\"$dependsfrom\"/>" >> "$nodesf"
        echo "            <edge id=\"$dependsfrom-$pkgname\" source=\"$dependsfrom\" target=\"$pkgname\" />" >> "$edgesf"
    done)

done < "$tempf"
# create the <nodes> XML node. Use uniq to remove duplicated nodes
echo "        <nodes>"
sort "$nodesf" | uniq
echo "        </nodes>"

# create the <nodes> XML node. Use uniq to remove duplicated nodes
echo "        <edges>"
sort "$edgesf" | uniq
echo "        </edges>"
echo "    </graph>"
echo "</gexf>"

# remove temporary files
rm "$tempf"
rm "$edgesf"
rm "$nodesf"

Example

I run this script on a server running a barebone installation of Debian Wheezy. I loaded the resulting GEXF file in Gephi and I tweaked a bit the controls:

Screenshot of Gephi showing package dependencies of a Debian wheezy server.

Screenshot of Gephi showing package dependencies of a Debian wheezy server.

In particular, I ran the "ForceAtlas 2" layout algorithm to neatly position all nodes (bottom left corner in the screenshot), I filtered the nodes to remove those that have a 0 degree, such as font packages (bottom right corner), and I ranked nodes based on their outdegree: nodes with higher outdegree are more red.

A few more tweaks in the Preview section of Gephi allow to create something like this:

Graph of Debian wheezy base installation package dependencies.

Graph of Debian wheezy base installation package dependencies.

The red node at the center of the graph is libc6 that, unsurprisingly, many packages depend on. The other two red nodes on the top left are debconf and dpkg: due to the very small installation size, they are relatively strongly connected to the other packages. The two nodes just below libc6 are zlib1g and multiarch-support. Finally, the node on the bottom right is python.

Here is the same graph for a Kubuntu installation:

Package dependency graph of a laptop running Kubuntu.

Package dependency graph of a laptop running Kubuntu.

Gephi also gives metrics about the graphs it loads. For example, the average degree of this graph is 5.3, and its diameter is 12 (i.e., the longest path between any two connected nodes has 12 edges), the average path length is 3.7 and so on.

The Gephi team has collected several datasets to explore their software, available here.