
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorials/_rendered_examples/dynamo/dynamo_compile_transformers_example.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_tutorials__rendered_examples_dynamo_dynamo_compile_transformers_example.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_tutorials__rendered_examples_dynamo_dynamo_compile_transformers_example.py:


.. _torch_compile_transformer:

Compiling a Transformer using torch.compile and TensorRT
==============================================================

This interactive script is intended as a sample of the `torch_tensorrt.dynamo.compile` workflow on a transformer-based model.

.. GENERATED FROM PYTHON SOURCE LINES 10-12

Imports and Model Definition
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. GENERATED FROM PYTHON SOURCE LINES 12-17

.. code-block:: python


    import torch
    import torch_tensorrt
    from transformers import BertModel


.. GENERATED FROM PYTHON SOURCE LINES 18-27

.. code-block:: python


    # Initialize model with float precision and sample inputs
    model = BertModel.from_pretrained("bert-base-uncased").eval().to("cuda")
    inputs = [
        torch.randint(0, 2, (1, 14), dtype=torch.int32).to("cuda"),
        torch.randint(0, 2, (1, 14), dtype=torch.int32).to("cuda"),
    ]



.. GENERATED FROM PYTHON SOURCE LINES 28-30

Optional Input Arguments to `torch_tensorrt.dynamo.compile`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. GENERATED FROM PYTHON SOURCE LINES 30-47

.. code-block:: python


    # Enabled precision for TensorRT optimization
    enabled_precisions = {torch.float}

    # Whether to print verbose logs
    debug = True

    # Workspace size for TensorRT
    workspace_size = 20 << 30

    # Maximum number of TRT Engines
    # (Lower value allows more graph segmentation)
    min_block_size = 3

    # Operations to Run in Torch, regardless of converter support
    torch_executed_ops = {}


.. GENERATED FROM PYTHON SOURCE LINES 48-50

Compilation with `torch_tensorrt.dynamo.compile`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. GENERATED FROM PYTHON SOURCE LINES 50-62

.. code-block:: python


    # Build and compile the model with torch.compile, using tensorrt backend
    optimized_model = torch_tensorrt.dynamo.compile(
        model,
        inputs,
        enabled_precisions=enabled_precisions,
        debug=debug,
        workspace_size=workspace_size,
        min_block_size=min_block_size,
        torch_executed_ops=torch_executed_ops,
    )


.. GENERATED FROM PYTHON SOURCE LINES 63-65

Equivalently, we could have run the above via the convenience frontend, as so:
`torch_tensorrt.compile(model, ir="dynamo_compile", inputs=inputs, ...)`

.. GENERATED FROM PYTHON SOURCE LINES 67-69

Inference
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. GENERATED FROM PYTHON SOURCE LINES 69-77

.. code-block:: python


    # Does not cause recompilation (same batch size as input)
    new_inputs = [
        torch.randint(0, 2, (1, 14), dtype=torch.int32).to("cuda"),
        torch.randint(0, 2, (1, 14), dtype=torch.int32).to("cuda"),
    ]
    new_outputs = optimized_model(*new_inputs)


.. GENERATED FROM PYTHON SOURCE LINES 78-86

.. code-block:: python


    # Does cause recompilation (new batch size)
    new_inputs = [
        torch.randint(0, 2, (4, 14), dtype=torch.int32).to("cuda"),
        torch.randint(0, 2, (4, 14), dtype=torch.int32).to("cuda"),
    ]
    new_outputs = optimized_model(*new_inputs)


.. GENERATED FROM PYTHON SOURCE LINES 87-89

Cleanup
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. GENERATED FROM PYTHON SOURCE LINES 89-95

.. code-block:: python


    # Finally, we use Torch utilities to clean up the workspace
    torch._dynamo.reset()

    with torch.no_grad():
        torch.cuda.empty_cache()


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  0.000 seconds)


.. _sphx_glr_download_tutorials__rendered_examples_dynamo_dynamo_compile_transformers_example.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example




    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: dynamo_compile_transformers_example.py <dynamo_compile_transformers_example.py>`

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: dynamo_compile_transformers_example.ipynb <dynamo_compile_transformers_example.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
