<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Pprof-It on Jake Bailey</title>
    <link>https://jakebailey.dev/tags/pprof-it/</link>
    <description>Recent content in Pprof-It on Jake Bailey</description>
    <generator>Hugo</generator>
    <language>en-us</language>
    <lastBuildDate>Sun, 26 Mar 2023 13:29:45 -0700</lastBuildDate>
    <atom:link href="https://jakebailey.dev/tags/pprof-it/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Speeding up pnpm</title>
      <link>https://jakebailey.dev/posts/pnpm-dt-2/</link>
      <pubDate>Sun, 26 Mar 2023 13:29:45 -0700</pubDate>
      <guid>https://jakebailey.dev/posts/pnpm-dt-2/</guid>
      <description>DefinitelyTyped contains over 8000 packages. What could go wrong?</description>
      <content:encoded><![CDATA[<h2 id="background">Background</h2>
<p>For more background, see the <a href="https://jakebailey.dev/posts/pnpm-dt-1/">previous post about DefinitelyTyped</a>.</p>
<p>TL;DR: DefinitelyTyped is huge; installing it in its entirety involves
processing <em>over 9,000</em> packages. And that&rsquo;s slow! Or is it?</p>
<h2 id="taking-a-profile">Taking a profile</h2>
<p>Many people may not know this, but I&rsquo;ve actually written more Go than I have
TypeScript.<sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup> As such, when I have a performance problem, the tool I like to
use is <a href="https://github.com/google/pprof">pprof</a>.</p>
<p>More commonly, this tool is used when profiling Go, C, C++ code. And I like this
tool! Lucky for me, there is
<a href="https://www.npmjs.com/package/@datadog/pprof">a library</a> which lets you use it
with Node.<sup id="fnref:2"><a href="#fn:2" class="footnote-ref" role="doc-noteref">2</a></sup> The API is pretty straightforward; you can start and stop
both CPU and heap profiles, and write them to disk.</p>
<p>Unfortunately, that&rsquo;s a little annoying, because effectively 100% of the time,
I&rsquo;m profiling a CLI application or someone else&rsquo;s project where I don&rsquo;t really
want to inject the code. It does include some code to let you do
<code>node --require=pprof myScript.js</code>, but there&rsquo;s no way to configure its
behavior.</p>
<p>So a few years ago, I made a little wrapper,
<a href="https://www.npmjs.com/package/pprof-it">pprof-it</a>, which makes things much
easier to use. You can check the README for more details, but in short, to get a
pprof profile you just run:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-plaintext" data-lang="plaintext"><span class="line"><span class="cl">$ pprof-it /path/to/script.js
</span></span></code></pre></div><p><code>pprof-it</code> will start profiling both CPU and heap allocation immediately at
startup then dump profiles to the current directory on exit. These files can
then be loaded into <code>pprof</code> (or one of the many other tools which support the
format, like <a href="https://flamegraph.com">flamegraph.com</a> or
<a href="https://www.speedscope.app">speedscope</a>).</p>
<p>So, let&rsquo;s take a profile of <code>pnpm install</code> on one of my work-in-progress &ldquo;DT as
a monorepo&rdquo; branches. (Forgive the roundabout way of running things; some of my
fixes are already released, so I need to do a little movie magic.)</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-plaintext" data-lang="plaintext"><span class="line"><span class="cl">$ npx --package=pnpm@7.30.0 -c &#39;pprof-it $(which pnpm) install&#39;
</span></span></code></pre></div><p>This actually OOMs on my laptop (I have yet to determine why), but on my
desktop, I get this:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-plaintext" data-lang="plaintext"><span class="line"><span class="cl">pprof-it: Starting profilers (heap, time)
</span></span><span class="line"><span class="cl">    # a very long pause...
</span></span><span class="line"><span class="cl">Scope: all 9031 workspace projects
</span></span><span class="line"><span class="cl">    # a very very long warning about cycles (I need to file an issue for this!)
</span></span><span class="line"><span class="cl">Lockfile is up to date, resolution step is skipped
</span></span><span class="line"><span class="cl">Already up to date
</span></span><span class="line"><span class="cl">    # another long pause
</span></span><span class="line"><span class="cl">Done in 1m 39.7s
</span></span><span class="line"><span class="cl">pprof-it: Stopping profilers
</span></span><span class="line"><span class="cl">pprof-it: Writing heap profile to pprof-heap-286252.pb.gz
</span></span><span class="line"><span class="cl">pprof-it: Writing time profile to pprof-time-286252.pb.gz
</span></span></code></pre></div><p>Great, now let&rsquo;s run pprof:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-plaintext" data-lang="plaintext"><span class="line"><span class="cl">$ pprof -http=: pprof-time-286252.pb.gz
</span></span></code></pre></div><p>Automatically, <code>pprof</code> starts up my browser and puts me right into the graph
view. This view outside of Node profiles is very useful, but Node profiles have
an unfortunate problem which leads to all anonymous (i.e. arrow) functions being
counted as one node named &ldquo;(anonymous)&rdquo;.<sup id="fnref:3"><a href="#fn:3" class="footnote-ref" role="doc-noteref">3</a></sup> So, let&rsquo;s flip into the
flame view.</p>
<p><img alt="A pprof profile of the original test case; two large blocks. The overall execution takes about 100 seconds." loading="lazy" src="/posts/pnpm-dt-2/profile1.png#center"></p>
<p>Already, I&rsquo;m excited; this is every profiler&rsquo;s dream. Two very obvious chunks of
work attributed to real names I can search for. Roughly 50 seconds are spent in
<code>createPkgGraph</code> and another 32 seconds in <code>getRootPackagesToLink</code>. I should
note that at this point in my adventure, I know <em>absolutely nothing</em> about how
<code>pnpm</code> works; I haven&rsquo;t even checked out the repo. But, now I know exactly where
to look! (If <code>pnpm</code> had been minified, I&rsquo;d be in a much worse position.)</p>
<h2 id="working-through-the-code">Working through the code</h2>
<p>From the get-go I can see that there&rsquo;s a lot of time spent in <code>resolve</code>. One
thing I hadn&rsquo;t mentioned was how I set up this huge monorepo; my
<a href="https://github.com/jakebailey/DefinitelyTyped/tree/blog-pnpm-workspaces-with-paths">initial version</a>
of the monorepo transition used version specifiers like <code>workspace:../node</code> to
directly map packages to each other, avoiding the need for us to specify
names/versions in every <code>package.json</code> (they&rsquo;re already auto-generated by the DT
publisher). Without even looking at the code, I (correctly) guessed that these
paths were involved in the slowdown and
<a href="https://github.com/pnpm/pnpm/issues/6277">filed an issue</a>.</p>
<p>It turns out that this path mapping is actually a negative for other reasons as
well, so I just rewrote my transform to use versions instead of paths. After
switching to this
<a href="https://github.com/jakebailey/DefinitelyTyped/tree/blog-pnpm-workspaces-with-versions">new version</a>,
the profile looks like this:</p>
<p><img alt="A pprof profile of the &ldquo;no paths&rdquo; test case, two large blocks, first one smaller than before. The overall execution takes about 65 seconds." loading="lazy" src="/posts/pnpm-dt-2/profile2.png#center"></p>
<p>Alright, that&rsquo;s better already, down from ~100 seconds to 64 seconds. We&rsquo;ll come
back to <code>resolve</code> later.</p>
<h2 id="createpkggraph"><code>createPkgGraph</code></h2>
<p>The first block is the first &ldquo;very long pause&rdquo; (which happens even in the &ldquo;new&rdquo;
version of the repo), so let&rsquo;s start there. Searching the <code>pnpm</code> codebase, I
find the offending function. It looks something like this (cut down for
brevity):</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-ts" data-lang="ts"><span class="line"><span class="cl"><span class="kd">function</span> <span class="nx">createPkgGraph</span><span class="p">(</span><span class="nx">pkgs</span>: <span class="kt">Array</span><span class="p">&lt;</span><span class="nt">Package</span><span class="p">&gt;)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="kr">const</span> <span class="nx">pkgMap</span> <span class="o">=</span> <span class="nx">createPkgMap</span><span class="p">(</span><span class="nx">pkgs</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="nx">mapValues</span><span class="p">((</span><span class="nx">pkg</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">({</span>
</span></span><span class="line"><span class="cl">        <span class="nx">dependencies</span>: <span class="kt">createNode</span><span class="p">(</span><span class="nx">pkg</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">        <span class="kr">package</span><span class="o">:</span> <span class="nx">pkg</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="p">}),</span> <span class="nx">pkgMap</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kd">function</span> <span class="nx">createNode</span><span class="p">(</span><span class="nx">pkg</span>: <span class="kt">Package</span><span class="p">)</span><span class="o">:</span> <span class="kt">string</span><span class="p">[]</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="kr">const</span> <span class="nx">dependencies</span> <span class="o">=</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="p">...</span><span class="nx">pkg</span><span class="p">.</span><span class="nx">manifest</span><span class="p">.</span><span class="nx">devDependencies</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">            <span class="p">...</span><span class="nx">pkg</span><span class="p">.</span><span class="nx">manifest</span><span class="p">.</span><span class="nx">optionalDependencies</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">            <span class="p">...</span><span class="nx">pkg</span><span class="p">.</span><span class="nx">manifest</span><span class="p">.</span><span class="nx">dependencies</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="p">};</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="nb">Object</span><span class="p">.</span><span class="nx">entries</span><span class="p">(</span><span class="nx">dependencies</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">            <span class="p">.</span><span class="nx">map</span><span class="p">(([</span><span class="nx">depName</span><span class="p">,</span> <span class="nx">rawSpec</span><span class="p">])</span> <span class="o">=&gt;</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                <span class="kr">const</span> <span class="nx">isWorkspaceSpec</span> <span class="o">=</span> <span class="nx">rawSpec</span><span class="p">.</span><span class="nx">startsWith</span><span class="p">(</span><span class="s2">&#34;workspace:&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">                <span class="kr">const</span> <span class="nx">spec</span> <span class="o">=</span> <span class="nx">npa</span><span class="p">.</span><span class="nx">resolve</span><span class="p">(</span><span class="nx">depName</span><span class="p">,</span> <span class="nx">rawSpec</span><span class="p">,</span> <span class="nx">pkg</span><span class="p">.</span><span class="nx">dir</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">                <span class="k">if</span> <span class="p">(</span><span class="nx">spec</span><span class="p">.</span><span class="kr">type</span> <span class="o">===</span> <span class="s2">&#34;directory&#34;</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                    <span class="kr">const</span> <span class="nx">matchedPkg</span> <span class="o">=</span> <span class="nb">Object</span><span class="p">.</span><span class="nx">values</span><span class="p">(</span><span class="nx">pkgMap</span><span class="p">).</span><span class="nx">find</span><span class="p">((</span><span class="nx">pkg</span><span class="p">)</span> <span class="o">=&gt;</span>
</span></span><span class="line"><span class="cl">                        <span class="nx">path</span><span class="p">.</span><span class="nx">relative</span><span class="p">(</span><span class="nx">pkg</span><span class="p">.</span><span class="nx">dir</span><span class="p">,</span> <span class="nx">spec</span><span class="p">.</span><span class="nx">fetchSpec</span><span class="p">)</span> <span class="o">===</span> <span class="s2">&#34;&#34;</span>
</span></span><span class="line"><span class="cl">                    <span class="p">);</span>
</span></span><span class="line"><span class="cl">                    <span class="k">return</span> <span class="nx">matchedPkg</span><span class="o">?</span><span class="p">.</span><span class="nx">dir</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">                <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">                <span class="kr">const</span> <span class="nx">pkgs</span> <span class="o">=</span> <span class="nb">Object</span><span class="p">.</span><span class="nx">values</span><span class="p">(</span><span class="nx">pkgMap</span><span class="p">).</span><span class="nx">filter</span><span class="p">((</span><span class="nx">pkg</span><span class="p">)</span> <span class="o">=&gt;</span>
</span></span><span class="line"><span class="cl">                    <span class="nx">pkg</span><span class="p">.</span><span class="nx">manifest</span><span class="p">.</span><span class="nx">name</span> <span class="o">===</span> <span class="nx">depName</span>
</span></span><span class="line"><span class="cl">                <span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">                <span class="k">if</span> <span class="p">(</span><span class="nx">pkgs</span><span class="p">.</span><span class="nx">length</span> <span class="o">===</span> <span class="mi">0</span><span class="p">)</span> <span class="k">return</span> <span class="s2">&#34;&#34;</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">                <span class="kr">const</span> <span class="nx">versions</span> <span class="o">=</span> <span class="nx">pkgs</span><span class="p">.</span><span class="nx">filter</span><span class="p">(({</span> <span class="nx">manifest</span> <span class="p">})</span> <span class="o">=&gt;</span> <span class="nx">manifest</span><span class="p">.</span><span class="nx">version</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">                    <span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">pkg</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="nx">pkg</span><span class="p">.</span><span class="nx">manifest</span><span class="p">.</span><span class="nx">version</span><span class="p">)</span> <span class="kr">as</span> <span class="kt">string</span><span class="p">[];</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">                <span class="k">if</span> <span class="p">(</span><span class="nx">isWorkspaceSpec</span> <span class="o">&amp;&amp;</span> <span class="nx">versions</span><span class="p">.</span><span class="nx">length</span> <span class="o">===</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                    <span class="kr">const</span> <span class="nx">matchedPkg</span> <span class="o">=</span> <span class="nx">pkgs</span><span class="p">.</span><span class="nx">find</span><span class="p">((</span><span class="nx">pkg</span><span class="p">)</span> <span class="o">=&gt;</span>
</span></span><span class="line"><span class="cl">                        <span class="nx">pkg</span><span class="p">.</span><span class="nx">manifest</span><span class="p">.</span><span class="nx">name</span> <span class="o">===</span> <span class="nx">depName</span>
</span></span><span class="line"><span class="cl">                    <span class="p">);</span>
</span></span><span class="line"><span class="cl">                    <span class="k">return</span> <span class="nx">matchedPkg</span><span class="o">!</span><span class="p">.</span><span class="nx">dir</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">                <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">                <span class="k">if</span> <span class="p">(</span><span class="nx">versions</span><span class="p">.</span><span class="nx">includes</span><span class="p">(</span><span class="nx">rawSpec</span><span class="p">))</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                    <span class="kr">const</span> <span class="nx">matchedPkg</span> <span class="o">=</span> <span class="nx">pkgs</span><span class="p">.</span><span class="nx">find</span><span class="p">((</span><span class="nx">pkg</span><span class="p">)</span> <span class="o">=&gt;</span>
</span></span><span class="line"><span class="cl">                        <span class="nx">pkg</span><span class="p">.</span><span class="nx">manifest</span><span class="p">.</span><span class="nx">name</span> <span class="o">===</span> <span class="nx">depName</span>
</span></span><span class="line"><span class="cl">                        <span class="o">&amp;&amp;</span> <span class="nx">pkg</span><span class="p">.</span><span class="nx">manifest</span><span class="p">.</span><span class="nx">version</span> <span class="o">===</span> <span class="nx">rawSpec</span>
</span></span><span class="line"><span class="cl">                    <span class="p">);</span>
</span></span><span class="line"><span class="cl">                    <span class="k">return</span> <span class="nx">matchedPkg</span><span class="o">!</span><span class="p">.</span><span class="nx">dir</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">                <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">                <span class="c1">// ...
</span></span></span><span class="line"><span class="cl">            <span class="p">})</span>
</span></span><span class="line"><span class="cl">            <span class="p">.</span><span class="nx">filter</span><span class="p">(</span><span class="nb">Boolean</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span></code></pre></div><p>Alright, so we can sort of see what might be going on here. First off, we have
<code>pkgMap</code>. By attaching to the code and looking at the variable, we find that
it&rsquo;s an object which consists of all 9,000+ packages. So doing anything with
that is going to take a while.</p>
<p>At the top level, we&rsquo;re already looping over every entry in the object via
ramda&rsquo;s <code>mapValues</code>. But, if we look inside <code>createNode</code>, we can see that it is
<em>also</em> looping over all of <code>pkgMap</code> by calling <code>Object.values(pkgMap)</code>! This is
quadratic; we&rsquo;ll be doing 9,000 x 9,000 scans over the array. We could fix this
by instead creating a mapping and accessing it instead. For example, one of the
loops is just looking for all of the entries in <code>pkgMap</code> where
<code>pkg.manifest.name</code> is some value. We could precalculate this mapping, producing
an object of type <code>Record&lt;string, Package[]&gt;</code>.</p>
<p>The other loop is more complicated; this is where <code>resolve</code> comes in. We can see
that we&rsquo;re searching not for a specific name but for a specific set of packages
whose paths map the one we specified (that <code>workspace:../node</code> from earlier).
This one is tricky, but it&rsquo;s possible that we could precalculate some table here
too, depending on how sensitive this code is to <code>path.resolve</code>&rsquo;s
platform-specific semantics.</p>
<p>Speaking of precalculating&hellip; We just said that <code>pkgMap</code> was huge. But, for
every call to <code>createNode</code>, we call <code>Object.values(pkgMap)</code>! The profile doesn&rsquo;t
explicitly state so, but this is really, really expensive. The good news is that
<code>pkgMap</code> is never modified. This means that we could calculate this big array
once and then reuse it, for example:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-ts" data-lang="ts"><span class="line"><span class="cl"><span class="kd">function</span> <span class="nx">createPkgGraph</span><span class="p">(</span><span class="nx">pkgs</span>: <span class="kt">Array</span><span class="p">&lt;</span><span class="nt">Package</span><span class="p">&gt;)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="kr">const</span> <span class="nx">pkgMap</span> <span class="o">=</span> <span class="nx">createPkgMap</span><span class="p">(</span><span class="nx">pkgs</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="kr">const</span> <span class="nx">pkgMapValues</span> <span class="o">=</span> <span class="nb">Object</span><span class="p">.</span><span class="nx">values</span><span class="p">(</span><span class="nx">pkgMap</span><span class="p">);</span> <span class="c1">// &lt;-- NEW!
</span></span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="nx">mapValues</span><span class="p">((</span><span class="nx">pkg</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">({</span>
</span></span><span class="line"><span class="cl">        <span class="nx">dependencies</span>: <span class="kt">createNode</span><span class="p">(</span><span class="nx">pkg</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">        <span class="kr">package</span><span class="o">:</span> <span class="nx">pkg</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="p">}),</span> <span class="nx">pkgMap</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kd">function</span> <span class="nx">createNode</span><span class="p">(</span><span class="nx">pkg</span>: <span class="kt">Package</span><span class="p">)</span><span class="o">:</span> <span class="kt">string</span><span class="p">[]</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="c1">// ...
</span></span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="nb">Object</span><span class="p">.</span><span class="nx">entries</span><span class="p">(</span><span class="nx">dependencies</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">            <span class="p">.</span><span class="nx">map</span><span class="p">(([</span><span class="nx">depName</span><span class="p">,</span> <span class="nx">rawSpec</span><span class="p">])</span> <span class="o">=&gt;</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                <span class="c1">// ...
</span></span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">                <span class="k">if</span> <span class="p">(</span><span class="nx">spec</span><span class="p">.</span><span class="kr">type</span> <span class="o">===</span> <span class="s2">&#34;directory&#34;</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                    <span class="kr">const</span> <span class="nx">matchedPkg</span> <span class="o">=</span> <span class="nx">pkgMapValues</span><span class="p">.</span><span class="nx">find</span><span class="p">((</span><span class="nx">pkg</span><span class="p">)</span> <span class="o">=&gt;</span>
</span></span><span class="line"><span class="cl">                        <span class="nx">path</span><span class="p">.</span><span class="nx">relative</span><span class="p">(</span><span class="nx">pkg</span><span class="p">.</span><span class="nx">dir</span><span class="p">,</span> <span class="nx">spec</span><span class="p">.</span><span class="nx">fetchSpec</span><span class="p">)</span> <span class="o">===</span> <span class="s2">&#34;&#34;</span>
</span></span><span class="line"><span class="cl">                    <span class="p">);</span>
</span></span><span class="line"><span class="cl">                    <span class="k">return</span> <span class="nx">matchedPkg</span><span class="o">?</span><span class="p">.</span><span class="nx">dir</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">                <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">                <span class="kr">const</span> <span class="nx">pkgs</span> <span class="o">=</span> <span class="nx">pkgMapValues</span><span class="p">.</span><span class="nx">filter</span><span class="p">((</span><span class="nx">pkg</span><span class="p">)</span> <span class="o">=&gt;</span>
</span></span><span class="line"><span class="cl">                    <span class="nx">pkg</span><span class="p">.</span><span class="nx">manifest</span><span class="p">.</span><span class="nx">name</span> <span class="o">===</span> <span class="nx">depName</span>
</span></span><span class="line"><span class="cl">                <span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">                <span class="c1">// ...
</span></span></span><span class="line"><span class="cl">            <span class="p">})</span>
</span></span><span class="line"><span class="cl">            <span class="p">.</span><span class="nx">filter</span><span class="p">(</span><span class="nb">Boolean</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span></code></pre></div><p>This turns out to save the bulk of the time. Yay!</p>
<p>Algorithmically, the code is still quadratic, but it&rsquo;s still a lot faster and
this kind of change is very safe, safe enough to be backported. I sent this one
as a <a href="https://github.com/pnpm/pnpm/pull/6281">quick PR</a>, and it&rsquo;s now out in
v7.30.4.</p>
<p>The fix to the quadratic-ness is going to be a different, more complicated
change I plan to send later.</p>
<p><strong>UPDATE:</strong> Later is now the past! All of the quadratic-ness has been fixed as
of:</p>
<ul>
<li><a href="https://github.com/pnpm/pnpm/pull/6287">perf(pkgs-graph): speed up createPkgGraph by using a table for manifest name lookup</a></li>
<li><a href="https://github.com/pnpm/pnpm/pull/6317">perf(pkgs-graph): speed up createPkgGraph when directory specifiers are present</a></li>
</ul>
<h2 id="getrootpackagestolink"><code>getRootPackagesToLink</code></h2>
<p>Let&rsquo;s look at the second big chunk. Cut down for brevity again, we have:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-ts" data-lang="ts"><span class="line"><span class="cl"><span class="kr">async</span> <span class="kd">function</span> <span class="nx">getRootPackagesToLink</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">    <span class="nx">lockfile</span>: <span class="kt">Lockfile</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="nx">opts</span><span class="o">:</span> <span class="p">{</span><span class="cm">/* some options */</span><span class="p">},</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="kr">const</span> <span class="nx">importerManifestsByImporterId</span> <span class="o">=</span> <span class="p">{};</span>
</span></span><span class="line"><span class="cl">    <span class="k">for</span> <span class="p">(</span><span class="kr">const</span> <span class="p">{</span> <span class="nx">id</span><span class="p">,</span> <span class="nx">manifest</span> <span class="p">}</span> <span class="k">of</span> <span class="nx">opts</span><span class="p">.</span><span class="nx">projects</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="nx">importerManifestsByImporterId</span><span class="p">[</span><span class="nx">id</span><span class="p">]</span> <span class="o">=</span> <span class="nx">manifest</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kr">const</span> <span class="nx">projectSnapshot</span> <span class="o">=</span> <span class="nx">lockfile</span><span class="p">.</span><span class="nx">importers</span><span class="p">[</span><span class="nx">opts</span><span class="p">.</span><span class="nx">importerId</span><span class="p">];</span>
</span></span><span class="line"><span class="cl">    <span class="kr">const</span> <span class="nx">allDeps</span> <span class="o">=</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="p">...</span><span class="nx">projectSnapshot</span><span class="p">.</span><span class="nx">devDependencies</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="p">...</span><span class="nx">projectSnapshot</span><span class="p">.</span><span class="nx">dependencies</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="p">...</span><span class="nx">projectSnapshot</span><span class="p">.</span><span class="nx">optionalDependencies</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="p">};</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="p">(</span><span class="k">await</span> <span class="nx">Promise</span><span class="p">.</span><span class="nx">all</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">        <span class="nb">Object</span><span class="p">.</span><span class="nx">entries</span><span class="p">(</span><span class="nx">allDeps</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">            <span class="p">.</span><span class="nx">map</span><span class="p">(</span><span class="kr">async</span> <span class="p">([</span><span class="nx">alias</span><span class="p">,</span> <span class="nx">ref</span><span class="p">])</span> <span class="o">=&gt;</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                <span class="c1">// ...
</span></span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">                <span class="k">return</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                    <span class="c1">// a bunch of props
</span></span></span><span class="line"><span class="cl">                <span class="p">};</span>
</span></span><span class="line"><span class="cl">            <span class="p">}),</span>
</span></span><span class="line"><span class="cl">    <span class="p">))</span>
</span></span><span class="line"><span class="cl">        <span class="p">.</span><span class="nx">filter</span><span class="p">(</span><span class="nb">Boolean</span><span class="p">)</span> <span class="kr">as</span> <span class="nx">LinkedDirectDep</span><span class="p">[];</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span></code></pre></div><p>Again, the profile is not being very specific. It&rsquo;s just saying that a lot of
time is being spent in <code>getRootPackagesToLink</code>. Thankfully, there&rsquo;s not much
code actually inside this function. It can only be the calculation of
<code>importerManifestsByImporterId</code>, or the spread to produce <code>allDeps</code>.</p>
<p>I debugged this to try and get the size of these elements.
<code>getRootPackagesToLink</code> is called for every package in the repo, and <code>allDeps</code>
is small. So that&rsquo;s not likely to be it.</p>
<p>The <code>importerManifestsByImporterId</code> loop, on the other hand, is suspicious. I
just said that <code>getRootPackagesToLink</code> is called once per package in the repo.
But, <code>opts.projects</code> <em>is</em> a big list of all packages in the repo! We&rsquo;re
quadratic again!</p>
<p>This is better than before, in theory; there are lookups inside the <code>.map</code> call
below, but they&rsquo;re efficient because they don&rsquo;t loop over <code>opts.projects</code> (as
opposed to <code>createNode</code> from earlier, which <em>does</em> do the linear lookup). But,
<code>getRootPackagesToLink</code> is recreating this mapping every single time it&rsquo;s
called!</p>
<p>If we scroll down a little bit, we can find its sole caller:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-ts" data-lang="ts"><span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">projectsToLink</span> <span class="o">=</span> <span class="nb">Object</span><span class="p">.</span><span class="nx">fromEntries</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">    <span class="k">await</span> <span class="nx">Promise</span><span class="p">.</span><span class="nx">all</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">        <span class="nx">projects</span><span class="p">.</span><span class="nx">map</span><span class="p">(</span><span class="kr">async</span> <span class="p">({</span> <span class="nx">rootDir</span><span class="p">,</span> <span class="nx">id</span><span class="p">,</span> <span class="nx">modulesDir</span> <span class="p">})</span> <span class="o">=&gt;</span> <span class="p">[</span><span class="nx">id</span><span class="p">,</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="nx">dir</span>: <span class="kt">rootDir</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">            <span class="nx">modulesDir</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">            <span class="nx">dependencies</span>: <span class="kt">await</span> <span class="nx">getRootPackagesToLink</span><span class="p">(</span><span class="nx">filteredLockfile</span><span class="p">,</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                <span class="c1">// ...
</span></span></span><span class="line"><span class="cl">                <span class="nx">projects</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                <span class="c1">// ...
</span></span></span><span class="line"><span class="cl">            <span class="p">}),</span>
</span></span><span class="line"><span class="cl">        <span class="p">}]),</span>
</span></span><span class="line"><span class="cl">    <span class="p">),</span>
</span></span><span class="line"><span class="cl"><span class="p">);</span>
</span></span></code></pre></div><p>There&rsquo;s that &ldquo;for each package&rdquo; thing again. Thankfully, we can again see that
<code>projects</code> is not changing between calls. So, we can instead calculate this
mapping <em>once</em> and pass it in to <code>getRootPackagesToLink</code>, again without changing
much logic.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-ts" data-lang="ts"><span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">importerManifestsByImporterId</span> <span class="o">=</span> <span class="p">{}</span> <span class="kr">as</span> <span class="p">{</span> <span class="p">[</span><span class="nx">id</span>: <span class="kt">string</span><span class="p">]</span><span class="o">:</span> <span class="nx">ProjectManifest</span><span class="p">;</span> <span class="p">};</span>
</span></span><span class="line"><span class="cl"><span class="k">for</span> <span class="p">(</span><span class="kr">const</span> <span class="p">{</span> <span class="nx">id</span><span class="p">,</span> <span class="nx">manifest</span> <span class="p">}</span> <span class="k">of</span> <span class="nx">opts</span><span class="p">.</span><span class="nx">projects</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="nx">importerManifestsByImporterId</span><span class="p">[</span><span class="nx">id</span><span class="p">]</span> <span class="o">=</span> <span class="nx">manifest</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">projectsToLink</span> <span class="o">=</span> <span class="nb">Object</span><span class="p">.</span><span class="nx">fromEntries</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">    <span class="k">await</span> <span class="nx">Promise</span><span class="p">.</span><span class="nx">all</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">        <span class="nx">projects</span><span class="p">.</span><span class="nx">map</span><span class="p">(</span><span class="kr">async</span> <span class="p">({</span> <span class="nx">rootDir</span><span class="p">,</span> <span class="nx">id</span><span class="p">,</span> <span class="nx">modulesDir</span> <span class="p">})</span> <span class="o">=&gt;</span> <span class="p">[</span><span class="nx">id</span><span class="p">,</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="nx">dir</span>: <span class="kt">rootDir</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">            <span class="nx">modulesDir</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">            <span class="nx">dependencies</span>: <span class="kt">await</span> <span class="nx">getRootPackagesToLink</span><span class="p">(</span><span class="nx">filteredLockfile</span><span class="p">,</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">                <span class="c1">// ...
</span></span></span><span class="line"><span class="cl">                <span class="nx">importerManifestsByImporterId</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">                <span class="c1">// ...
</span></span></span><span class="line"><span class="cl">            <span class="p">}),</span>
</span></span><span class="line"><span class="cl">        <span class="p">}]),</span>
</span></span><span class="line"><span class="cl">    <span class="p">),</span>
</span></span><span class="line"><span class="cl"><span class="p">);</span>
</span></span></code></pre></div><p>Now drop the code to produce the mapping from <code>getRootPackagesToLink</code> and we&rsquo;re
done.</p>
<p>I sent this as <a href="https://github.com/pnpm/pnpm/pull/6282">a PR</a> over too, and it
also is available in v7.30.4.</p>
<h2 id="the-final-result-for-now">The &ldquo;final&rdquo; result (for now)</h2>
<p>Now that we have these two fixes in, let&rsquo;s re-profile <code>pnpm install</code> for the
newer version:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-plaintext" data-lang="plaintext"><span class="line"><span class="cl">$ npx --package=pnpm@7.30.4 -c &#39;pprof-it $(which pnpm) install&#39;
</span></span><span class="line"><span class="cl"># ...
</span></span><span class="line"><span class="cl">Done in 13.6s
</span></span></code></pre></div><p>Immediately, the difference is evident. There&rsquo;s no longer a huge delay before I
get the cycle warning. The whole thing now takes <em>13.6 seconds</em>. That&rsquo;s a huge
improvement! It&rsquo;s outlandishly good to be processing 9,000+ packages in such a
short time.</p>
<p>What about the profile, though?</p>
<p><img alt="A pprof profile of the finalized code, with the two blocks (mostly) gone, and a lot of little stuff now showing. The overall execution takes about 13 seconds." loading="lazy" src="/posts/pnpm-dt-2/profile3.png#center"></p>
<p>Much different. We can see that the huge obvious blocks are gone, leaving us
with a bunch of small stuff (if two obvious chunks were &ldquo;the dream&rdquo;, a bunch of
small stuff is &ldquo;the nightmare&rdquo;). We can still see that <code>createPkgGraph</code> is still
the most obvious chunk, lending to the fact that we didn&rsquo;t fix the fact that
it&rsquo;s quadratic. But, if we fix that, that&rsquo;ll be a few more seconds saved! And,
we can profile it again, and maybe we can look into <code>sequenceGraph</code> or
<code>getAllProjects</code>, the next big chunks.</p>
<h2 id="recapping">Recapping</h2>
<p>To recap, we:</p>
<ul>
<li>Ran <code>pnpm</code> on a huge monorepo, and found it to be suspiciously slow, visibly
hanging at times.</li>
<li>Ran <code>pprof-it</code> to take a look under the hood.</li>
<li>Found a couple of big candidates for optimization.</li>
<li>Stared at some code.</li>
<li>Got lucky, addressing both problems by simply shifting some code around.</li>
<li>Made <code>pnpm</code> 4x faster! (For this super ridiculous test case, anyway.)</li>
</ul>
<p>I hope this was informative. Profiling is an excellent trick to have in your
toolbox. Sometimes, you&rsquo;ll be unlucky and it won&rsquo;t show you much. But, when you
<em>do</em> find something, it&rsquo;s worth having spent a few minutes trying it out.</p>
<p>In case you&rsquo;re curious what else we&rsquo;ve (me and the TypeScript team) have been
able to find, check out these PRs and issues:</p>
<ul>
<li>A <a href="https://github.com/microsoft/TypeScript/pull/53346">performance boost</a> from
avoiding the calculation of all properties of unions / intersections where all
we wanted to know is if any type matches a condition.</li>
<li>A <a href="https://github.com/microsoft/TypeScript/pull/53358">performance boost</a> by
discovering that a computation was not being cached.</li>
<li>A
<a href="https://github.com/microsoft/TypeScript/issues/52345">performance regression</a>
I (unwittingly) introduced in TypeScript&rsquo;s string template literals when used
with intersections, with two PRs
(<a href="https://github.com/microsoft/TypeScript/pull/53406">#53406</a> and
<a href="https://github.com/microsoft/TypeScript/pull/53413">#53413</a>) attempting to
address it.</li>
<li>A <a href="https://github.com/microsoft/TypeScript/pull/52382">performance boost</a> in
TypeScript 5.0, where I identified that we weren&rsquo;t reusing our &ldquo;printers&rdquo; as
much as we could have, saving a few percent (and even more in some projects).</li>
<li>An <a href="https://github.com/microsoft/TypeScript/pull/44100">older PR</a> where
<code>pprof</code> had pointed out that a lot of time during a build of a TypeScript
project was being spent normalizing paths, even if the platform was UNIX-like
and the paths were already using the correct slashes.</li>
<li>A <a href="https://github.com/microsoft/pyright/pull/1774">PR I sent back</a> when I was
working on Pylance/pyright, where 50% of GC time was spent concatenating
strings.</li>
</ul>
<div class="footnotes" role="doc-endnotes">
<hr>
<ol>
<li id="fn:1">
<p>Well, this used to be true, but might not be anymore. Definitely not if
you <code>git blame</code> the TypeScript repo and forget to use
<code>.git-blame-ignore-revs</code>!
<a href="https://devblogs.microsoft.com/typescript/typescripts-migration-to-modules/">Thanks, modules</a>.&#160;<a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:2">
<p>Okay, this is a fork of
<a href="https://www.npmjs.com/package/pprof">the original</a> released by Google, but
that one hasn&rsquo;t been updated in years, and DataDog&rsquo;s fork includes prebuilt
binaries.&#160;<a href="#fnref:2" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:3">
<p>This is something I&rsquo;ve been meaning to dig into, but it turns out
to be a problem that also happens to the more typical <code>.cpuprofile</code> files
Node performance nerds may already be familiar with, so I just haven&rsquo;t
prioritized looking into it.&#160;<a href="#fnref:3" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
</ol>
</div>
]]></content:encoded>
    </item>
  </channel>
</rss>
