Skip to content

Commit

Permalink
auto-generating sphinx docs
Browse files Browse the repository at this point in the history
  • Loading branch information
pytorchbot committed Oct 30, 2024
1 parent 4cdfe83 commit ac8832d
Show file tree
Hide file tree
Showing 5 changed files with 7 additions and 17 deletions.
3 changes: 2 additions & 1 deletion main/_modules/torchtune/data/_collate.html
Original file line number Diff line number Diff line change
Expand Up @@ -855,7 +855,8 @@ <h1>Source code for torchtune.data._collate</h1><div class="highlight"><pre>
<span class="k">if</span> <span class="n">pad_max_images</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
<span class="n">_</span><span class="p">,</span> <span class="n">_</span><span class="p">,</span> <span class="n">img_seq</span> <span class="o">=</span> <span class="n">concat_masks</span><span class="o">.</span><span class="n">shape</span>
<span class="n">concat_masks</span> <span class="o">=</span> <span class="n">F</span><span class="o">.</span><span class="n">pad</span><span class="p">(</span>
<span class="n">concat_masks</span><span class="p">,</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">pad_max_images</span> <span class="o">*</span> <span class="n">image_seq_len</span> <span class="o">-</span> <span class="n">img_seq</span><span class="p">)</span>
<span class="n">concat_masks</span><span class="p">,</span>
<span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">pad_max_images</span> <span class="o">*</span> <span class="n">max_num_tiles</span> <span class="o">*</span> <span class="n">tokens_per_tile</span> <span class="o">-</span> <span class="n">img_seq</span><span class="p">),</span>
<span class="p">)</span>

<span class="n">batch_dict</span> <span class="o">=</span> <span class="p">{</span>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -522,11 +522,11 @@ <h1>Source code for torchtune.models.llama3_2_vision._transform</h1><div class="
<span class="n">tile_size</span><span class="o">=</span><span class="n">tile_size</span><span class="p">,</span>
<span class="n">patch_size</span><span class="o">=</span><span class="n">patch_size</span><span class="p">,</span>
<span class="n">image_token_id</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">image_id</span><span class="p">,</span>
<span class="n">max_num_tiles</span><span class="o">=</span><span class="n">max_num_tiles</span><span class="p">,</span>
<span class="p">)</span>

<span class="bp">self</span><span class="o">.</span><span class="n">stop_tokens</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">stop_tokens</span>
<span class="bp">self</span><span class="o">.</span><span class="n">max_seq_len</span> <span class="o">=</span> <span class="n">max_seq_len</span>
<span class="bp">self</span><span class="o">.</span><span class="n">max_num_tiles</span> <span class="o">=</span> <span class="n">max_num_tiles</span>
<span class="bp">self</span><span class="o">.</span><span class="n">image_seq_len</span> <span class="o">=</span> <span class="n">max_num_tiles</span> <span class="o">*</span> <span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">xattn_mask</span><span class="o">.</span><span class="n">patches_per_tile</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">prompt_template</span> <span class="o">=</span> <span class="n">prompt_template</span>
<span class="bp">self</span><span class="o">.</span><span class="n">pad_id</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">pad_id</span>
Expand Down
13 changes: 2 additions & 11 deletions main/_modules/torchtune/modules/transforms/_transforms.html
Original file line number Diff line number Diff line change
Expand Up @@ -433,7 +433,7 @@ <h1>Source code for torchtune.modules.transforms._transforms</h1><div class="hig
<span class="c1"># This source code is licensed under the BSD-style license found in the</span>
<span class="c1"># LICENSE file in the root directory of this source tree.</span>

<span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Any</span><span class="p">,</span> <span class="n">List</span><span class="p">,</span> <span class="n">Mapping</span><span class="p">,</span> <span class="n">Optional</span><span class="p">,</span> <span class="n">Protocol</span>
<span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Any</span><span class="p">,</span> <span class="n">List</span><span class="p">,</span> <span class="n">Mapping</span><span class="p">,</span> <span class="n">Protocol</span>

<span class="kn">import</span> <span class="nn">torch</span>

Expand Down Expand Up @@ -486,21 +486,17 @@ <h1>Source code for torchtune.modules.transforms._transforms</h1><div class="hig
<span class="sd"> E.g. for patch_size = 40, a tile of shape (400, 400) will have 10x10 grid of patches</span>
<span class="sd"> with shape (40, 40) each.</span>
<span class="sd"> image_token_id (int): Token ID of the image special token.</span>
<span class="sd"> max_num_tiles (Optional[int]): Maximum number of tiles in an image, used to</span>
<span class="sd"> pad mask during inference. Defaults to None</span>
<span class="sd"> &quot;&quot;&quot;</span>

<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span>
<span class="bp">self</span><span class="p">,</span>
<span class="n">tile_size</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span>
<span class="n">patch_size</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span>
<span class="n">image_token_id</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span>
<span class="n">max_num_tiles</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="nb">int</span><span class="p">]</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span>
<span class="p">):</span>
<span class="n">patch_grid_size</span> <span class="o">=</span> <span class="n">tile_size</span> <span class="o">//</span> <span class="n">patch_size</span>
<span class="bp">self</span><span class="o">.</span><span class="n">patches_per_tile</span> <span class="o">=</span> <span class="n">patch_grid_size</span><span class="o">**</span><span class="mi">2</span>
<span class="bp">self</span><span class="o">.</span><span class="n">image_token_id</span> <span class="o">=</span> <span class="n">image_token_id</span>
<span class="bp">self</span><span class="o">.</span><span class="n">max_num_tiles</span> <span class="o">=</span> <span class="n">max_num_tiles</span>

<span class="k">def</span> <span class="nf">_get_image_attention_intervals</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">tokens</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="nb">int</span><span class="p">])</span> <span class="o">-&gt;</span> <span class="n">List</span><span class="p">[</span><span class="n">List</span><span class="p">[</span><span class="nb">int</span><span class="p">]]:</span>
<span class="w"> </span><span class="sd">&quot;&quot;&quot;</span>
Expand Down Expand Up @@ -592,9 +588,6 @@ <h1>Source code for torchtune.modules.transforms._transforms</h1><div class="hig
<span class="c1"># which can vary based on number of tiles since they are not yet tile padded.</span>
<span class="c1"># The masks are padded and concatenated together in the batch collator</span>
<span class="n">text_seq_len</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">tokens</span><span class="p">)</span>
<span class="n">max_image_size</span> <span class="o">=</span> <span class="kc">None</span>
<span class="k">if</span> <span class="n">inference</span> <span class="ow">and</span> <span class="bp">self</span><span class="o">.</span><span class="n">max_num_tiles</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
<span class="n">max_image_size</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">max_num_tiles</span> <span class="o">*</span> <span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">patches_per_tile</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span>
<span class="n">masks</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">image_num</span><span class="p">,</span> <span class="n">interval</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">intervals</span><span class="p">):</span>
<span class="c1"># Identify what part of text sequence should be attended</span>
Expand All @@ -607,9 +600,7 @@ <h1>Source code for torchtune.modules.transforms._transforms</h1><div class="hig
<span class="c1"># to a single image, so text tokens attend to all the image&#39;s tokens.</span>
<span class="c1"># The mask is text_seq_len x mask_image_size if defined, otherwise</span>
<span class="c1"># it uses current text/image sequence lengths.</span>
<span class="n">mask</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span>
<span class="n">text_seq_len</span><span class="p">,</span> <span class="n">max_image_size</span> <span class="ow">or</span> <span class="n">image_seq_len</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">torch</span><span class="o">.</span><span class="n">bool</span>
<span class="p">)</span>
<span class="n">mask</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">text_seq_len</span><span class="p">,</span> <span class="n">image_seq_len</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">torch</span><span class="o">.</span><span class="n">bool</span><span class="p">)</span>
<span class="n">mask</span><span class="p">[</span><span class="n">start</span><span class="p">:</span><span class="n">end</span><span class="p">,</span> <span class="p">:</span><span class="n">image_seq_len</span><span class="p">]</span> <span class="o">=</span> <span class="kc">True</span>
<span class="n">masks</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">mask</span><span class="p">)</span>

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -437,7 +437,7 @@
<h1>VisionCrossAttentionMask<a class="headerlink" href="#visioncrossattentionmask" title="Permalink to this heading"></a></h1>
<dl class="py class">
<dt class="sig sig-object py" id="torchtune.modules.transforms.VisionCrossAttentionMask">
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">torchtune.modules.transforms.</span></span><span class="sig-name descname"><span class="pre">VisionCrossAttentionMask</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">tile_size</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference external" href="https://docs.python.org/3/library/functions.html#int" title="(in Python v3.13)"><span class="pre">int</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">patch_size</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference external" href="https://docs.python.org/3/library/functions.html#int" title="(in Python v3.13)"><span class="pre">int</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">image_token_id</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference external" href="https://docs.python.org/3/library/functions.html#int" title="(in Python v3.13)"><span class="pre">int</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">max_num_tiles</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference external" href="https://docs.python.org/3/library/typing.html#typing.Optional" title="(in Python v3.13)"><span class="pre">Optional</span></a><span class="p"><span class="pre">[</span></span><a class="reference external" href="https://docs.python.org/3/library/functions.html#int" title="(in Python v3.13)"><span class="pre">int</span></a><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="../_modules/torchtune/modules/transforms/_transforms.html#VisionCrossAttentionMask"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#torchtune.modules.transforms.VisionCrossAttentionMask" title="Permalink to this definition"></a></dt>
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">torchtune.modules.transforms.</span></span><span class="sig-name descname"><span class="pre">VisionCrossAttentionMask</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">tile_size</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference external" href="https://docs.python.org/3/library/functions.html#int" title="(in Python v3.13)"><span class="pre">int</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">patch_size</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference external" href="https://docs.python.org/3/library/functions.html#int" title="(in Python v3.13)"><span class="pre">int</span></a></span></em>, <em class="sig-param"><span class="n"><span class="pre">image_token_id</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><a class="reference external" href="https://docs.python.org/3/library/functions.html#int" title="(in Python v3.13)"><span class="pre">int</span></a></span></em><span class="sig-paren">)</span><a class="reference internal" href="../_modules/torchtune/modules/transforms/_transforms.html#VisionCrossAttentionMask"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#torchtune.modules.transforms.VisionCrossAttentionMask" title="Permalink to this definition"></a></dt>
<dd><p>Computes the cross-attention mask for text + image inputs. Text tokens that
participate in cross-attention with an image token will show True in the mask
and follow the interleaved structure laid out in Fig. 7 of the Flamingo paper
Expand Down Expand Up @@ -472,8 +472,6 @@ <h1>VisionCrossAttentionMask<a class="headerlink" href="#visioncrossattentionmas
E.g. for patch_size = 40, a tile of shape (400, 400) will have 10x10 grid of patches
with shape (40, 40) each.</p></li>
<li><p><strong>image_token_id</strong> (<a class="reference external" href="https://docs.python.org/3/library/functions.html#int" title="(in Python v3.13)"><em>int</em></a>) – Token ID of the image special token.</p></li>
<li><p><strong>max_num_tiles</strong> (<em>Optional</em><em>[</em><a class="reference external" href="https://docs.python.org/3/library/functions.html#int" title="(in Python v3.13)"><em>int</em></a><em>]</em>) – Maximum number of tiles in an image, used to
pad mask during inference. Defaults to None</p></li>
</ul>
</dd>
</dl>
Expand Down
2 changes: 1 addition & 1 deletion main/searchindex.js

Large diffs are not rendered by default.

0 comments on commit ac8832d

Please sign in to comment.