
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorials/_rendered_examples/dynamo/torch_export_cudagraphs.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_tutorials__rendered_examples_dynamo_torch_export_cudagraphs.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_tutorials__rendered_examples_dynamo_torch_export_cudagraphs.py:


.. _torch_export_cudagraphs:

Torch Export with Cudagraphs
======================================================

This interactive script is intended as an overview of the process by which the Torch-TensorRT Cudagraphs integration can be used in the `ir="dynamo"` path. The functionality works similarly in the `torch.compile` path as well.

.. GENERATED FROM PYTHON SOURCE LINES 10-12

Imports and Model Definition
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. GENERATED FROM PYTHON SOURCE LINES 12-17

.. code-block:: python


    import torch
    import torch_tensorrt
    import torchvision.models as models


.. GENERATED FROM PYTHON SOURCE LINES 18-20

Compilation with `torch_tensorrt.compile` Using Default Settings
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. GENERATED FROM PYTHON SOURCE LINES 20-27

.. code-block:: python


    # We begin by defining and initializing a model
    model = models.resnet18(pretrained=True).eval().to("cuda")

    # Define sample inputs
    inputs = torch.randn((16, 3, 224, 224)).cuda()


.. GENERATED FROM PYTHON SOURCE LINES 28-44

.. code-block:: python


    # Next, we compile the model using torch_tensorrt.compile
    # We use the `ir="dynamo"` flag here, and `ir="torch_compile"` should
    # work with cudagraphs as well.
    opt = torch_tensorrt.compile(
        model,
        ir="dynamo",
        inputs=torch_tensorrt.Input(
            min_shape=(1, 3, 224, 224),
            opt_shape=(8, 3, 224, 224),
            max_shape=(16, 3, 224, 224),
            dtype=torch.float,
            name="x",
        ),
    )


.. GENERATED FROM PYTHON SOURCE LINES 45-47

Inference using the Cudagraphs Integration
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. GENERATED FROM PYTHON SOURCE LINES 47-60

.. code-block:: python


    # We can enable the cudagraphs API with a context manager
    with torch_tensorrt.runtime.enable_cudagraphs(opt) as cudagraphs_module:
        out_trt = cudagraphs_module(inputs)

    # Alternatively, we can set the cudagraphs mode for the session
    torch_tensorrt.runtime.set_cudagraphs_mode(True)
    out_trt = opt(inputs)

    # We can also turn off cudagraphs mode and perform inference as normal
    torch_tensorrt.runtime.set_cudagraphs_mode(False)
    out_trt = opt(inputs)


.. GENERATED FROM PYTHON SOURCE LINES 61-70

.. code-block:: python


    # If we provide new input shapes, cudagraphs will re-record the graph
    inputs_2 = torch.randn((8, 3, 224, 224)).cuda()
    inputs_3 = torch.randn((4, 3, 224, 224)).cuda()

    with torch_tensorrt.runtime.enable_cudagraphs(opt) as cudagraphs_module:
        out_trt_2 = cudagraphs_module(inputs_2)
        out_trt_3 = cudagraphs_module(inputs_3)


.. GENERATED FROM PYTHON SOURCE LINES 71-85

Cuda graphs with module that contains graph breaks
----------------------------------

When CUDA Graphs are applied to a TensorRT model that contains graph breaks, each break introduces additional
overhead. This occurs because graph breaks prevent the entire model from being executed as a single, continuous
optimized unit. As a result, some of the performance benefits typically provided by CUDA Graphs, such as reduced
kernel launch overhead and improved execution efficiency, may be diminished.
Using a wrapped runtime module with CUDA Graphs allows you to encapsulate sequences of operations into graphs
that can be executed efficiently, even in the presence of graph breaks.
If TensorRT module has graph breaks, CUDA Graph context manager returns a wrapped_module. This module captures entire
execution graph, enabling efficient replay during subsequent inferences by reducing kernel launch overheads
and improving performance. Note that initializing with the wrapper module involves a warm-up phase where the
module is executed several times. This warm-up ensures that memory allocations and initializations are not
recorded in CUDA Graphs, which helps maintain consistent execution paths and optimize performance.

.. GENERATED FROM PYTHON SOURCE LINES 85-106

.. code-block:: python



    class SampleModel(torch.nn.Module):
        def forward(self, x):
            return torch.relu((x + 2) * 0.5)


    model = SampleModel().eval().cuda()
    input = torch.randn((1, 3, 224, 224)).to("cuda")

    # The 'torch_executed_ops' compiler option is used in this example to intentionally introduce graph breaks within the module.
    # Note: The Dynamo backend is required for the CUDA Graph context manager to handle modules in an Ahead-Of-Time (AOT) manner.
    opt_with_graph_break = torch_tensorrt.compile(
        model,
        ir="dynamo",
        inputs=[input],
        min_block_size=1,
        pass_through_build_failures=True,
        torch_executed_ops={"torch.ops.aten.mul.Tensor"},
    )


.. GENERATED FROM PYTHON SOURCE LINES 107-108

If module has graph breaks, whole submodules are recorded and replayed by cuda graphs

.. GENERATED FROM PYTHON SOURCE LINES 108-112

.. code-block:: python

    with torch_tensorrt.runtime.enable_cudagraphs(
        opt_with_graph_break
    ) as cudagraphs_module:
        cudagraphs_module(input)


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  0.000 seconds)


.. _sphx_glr_download_tutorials__rendered_examples_dynamo_torch_export_cudagraphs.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example




    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: torch_export_cudagraphs.py <torch_export_cudagraphs.py>`

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: torch_export_cudagraphs.ipynb <torch_export_cudagraphs.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
