Packages installed on a Linux system can depend on other packages, for example an application depends on shared libraries that it uses. This means that the “depends on” relationships defines a graph over packages.
Visualizing the “depends on” relationship allows understand at a glance what are the core packages and why each package was installed; so let’s explore de dependencies of packets installed on a Debian-based Linux installation.
Dependencies of a package can be listed using
apt-cache depends <packagename>. For example, the list of dependencies of vim are:
So what we want to do is:
- list all installed packages;
- list dependencies for each installed package;
- put dependencies in some format that allows easy visualization.
Listing installed packages
Listing installed packages is easily done with
Let’s explain this command:
dpkg --get-selections: list all packages;
grep -v 'deinstall': remove all lines that contain the word “deinstall”, thus only packages whose selection state is “install” will be kept;
cut -f1 | cut -d: -f1
Now we want to get the list dependencies for each of these packages.
apt-cache allows to easily list dependencies of a package, given its name. When parsing its output, we should keep in mind that alternative dependencies start with a pipe (“
|“) and virtual packages are shown within angle brackets (
Assuming that the variable
$pkgname contains the name of a valid package, we can use the following command to list the packages it depends on:
What it does is:
apt-cache depends "$pkgname": list all dependencies for the $pkgname package;
grep 'Depends:': keep only lines containing the "Depends" word (i.e., "Depends" and "PreDepends");
grep -v "<": remove references to virtual packages;
cut -d':' -f2-: keep only package names;
tr -d ' ': trim possible whitespaces.
Let's put all in a bash script:
I choose Gephi to visualize the dependency graph.
A possible alternative would be dot from the GraphViz suite. However, on my desktop machine there are about 2.500 installed packages and more than 13.000 dependencies: this graph is big enough to make very hard for
dot to find a reasonable layout for all the graph nodes and output an image that can be actually visualized.
Gephi supports several graph definition formats, including GEXF. GEXF is an XML-based format and it is very simple to adapt our script so that it outputs dependencies in the GEXF format:
I run this script on a server running a barebone installation of Debian Wheezy. I loaded the resulting GEXF file in Gephi and I tweaked a bit the controls:
In particular, I ran the "ForceAtlas 2" layout algorithm to neatly position all nodes (bottom left corner in the screenshot), I filtered the nodes to remove those that have a 0 degree, such as font packages (bottom right corner), and I ranked nodes based on their outdegree: nodes with higher outdegree are more red.
A few more tweaks in the Preview section of Gephi allow to create something like this:
The red node at the center of the graph is
libc6 that, unsurprisingly, many packages depend on. The other two red nodes on the top left are
dpkg: due to the very small installation size, they are relatively strongly connected to the other packages. The two nodes just below
multiarch-support. Finally, the node on the bottom right is
Here is the same graph for a Kubuntu installation:
Gephi also gives metrics about the graphs it loads. For example, the average degree of this graph is 5.3, and its diameter is 12 (i.e., the longest path between any two connected nodes has 12 edges), the average path length is 3.7 and so on.
The Gephi team has collected several datasets to explore their software, available here.