@shorttitle(About FastMath)
@title(FastMath - Fast Math Library for Delphi)

FastMath is a Delphi math library that is optimized for fast performance
(sometimes at the cost of not performing error checking or losing a little
accuracy). It uses hand-optimized assembly code to achieve much better
performance then the equivalent functions provided by the Delphi RTL.

This makes FastMath ideal for high-performance math-intensive applications such
as multi-media applications and games. For even better performance, the library
provides a variety of "approximate" functions (which all start with a 
@code(Fast)-prefix). These can be very fast, but you will lose some (sometimes 
surprisingly little) accuracy. For gaming and animation, this loss in accuracy 
is usually perfectly acceptable and outweighed by the increase in speed. Don't 
use them for scientific calculations though...

You may want to call @code(DisableFloatingPointExceptions) at application 
startup to suppress any floating-point exceptions. Instead, it will return 
extreme values (like Nan or Infinity) when an operation cannot be performed. If
you use FastMath in multiple threads, you should call
@code(DisableFloatingPointExceptions) in the @code(Execute) block of those 
threads.

@section(1 _Performance Superior Performance)
Most operations can be performed on both singular values (scalars) as well
as vectors (consisting of 2, 3 or 4 values). SIMD optimized assembly code is
used to calculate multiple outputs at the same time. For example, adding two
4-value vectors together is almost as fast as adding two single values together,
resulting in a 4-fold speed increase. Many functions are written in such a way
that the performance is even better. You will find a lot of functions that are
10 or more times faster then their Delphi counterparts. 

On 32-bit and 64-bit desktop platforms (Windows and OS X), this performance is 
achieved by using the SSE2 instruction set. This means that the computer must 
support SSE2. However, since SSE2 was introduced back in 2001, the vast majority 
of computers in use today will support it. All 64-bit desktop computers have
SSE2 support by default. However, you can always compile this library with
the @code(FM_NOSIMD) define to disable SIMD optimization and use plain Pascal 
versions. This can also be useful to compare the speed of the Pascal versions 
with the SIMD optimized versions.

On 32-bit mobile platforms (iOS and Android), the NEON instruction set is used
for SIMD optimization. This means that your device needs to support NEON. But
since Delphi already requires this, this poses no further restrictions.

On 64-bit mobile platforms (iOS), the Arm64/AArch64 SIMD instruction set is
used.

@section(1 _Architecture Architecture and Design Decisions)
FastMath operations on single-precision floating-point values only.
Double-precision floating-point arithmetic is (currently) unsupported.

Most functions operate on single values (of type @code(Single)) and 2-, 3- and 
4-dimensional vectors (of types @code(TVector2), @code(TVector3) and 
@code(TVector4) respectively). Vectors are not only used to represent points or
directions in space, but can also be regarded as arrays of 2, 3 or 4 values that
can be used to perform calculations in parallel. In addition to floating-point
vectors, there are also vectors that operator on integer values 
(@code(TIVector2), @code(TIVector3) and @code(TIVector4)).

There is also support for 2x2, 3x3 and 4x4 matrices (called @code(TMatrix2), 
@code(TMatrix3) and @code(TMatrix4)). By default, matrices are stored in 
row-major order, like those in the RTL's @code(System.Math.Vectors) unit. 
However, you can change this layout with the @code(FM_COLUMN_MAJOR) define. 
This will store matrices in column-major order instead, which is useful for 
OpenGL applications (which work best with this layout). In addition, this define 
will also clip the depth of camera matrices to -1..1 instead of the default 
0..1. Again, this is more in line with the default for OpenGL applications.

For representing rotations in 3D space, there is also a @code(TQuaternion), 
which is similar to the RTL's @code(TQuaternion3D) type.

The operation of the library is somewhat inspired by shader languages (such as 
GLSL and HLSL). In those languages you can also treat single values and vectors 
similarly. For example, you can use the @code(Sin) function to calculate a 
single sine value, but you can also use it with a @code(TVector4) type to 
calculate 4 sine values in one call. When combined with the "approximate" 
@code(Fast)* functions, this can result in an enormous performance boost. For
example, using @code(FastSin(TVector4)) to calculate 4 sine values in parallel
is up to 40 times faster than 4 separate calls to @code(Sin(Single)). On the
extreme end, calling @code(FastExp2(TVector4)) is up to 300 times faster than 4
separate @code(Exp2(Single)) calls.

@section(1 _Operators Overloaded Operators)
All vector and matrix types support overloaded operators which allow you to
negate, add, subtract, multiply and divide scalars, vectors and matrices. There
are also overloaded operators that compare vectors and matrices for equality.
These operators check for "exact" matches (like Delphi's "=" operator). They
@bold(don't) allow for very small variations (like Delphi's @code(SameValue) 
functions).

The arithmetic operators "+", "-", "*" and "/" usually work component-wise when
applied to vectors. For example if @code(A) and @code(B) are of type 
@code(TVector4), then @code(C := A * B) will set @code(C) to 
@code((A.X * B.X, A.Y * B.Y, A.Z * B.Z, A.W * B.W)). It will @bold(not) perform 
a dot or cross product (you can use the @code(Dot) and @code(Cross) functions to 
compute those).

For matrices, the "+" and "-" operators also operate component-wise. However,
when multiplying (or dividing) matrices with vectors or other matrices, then the
usual linear algebraic multiplication (or division) is used. For example:

* @code(M := M1 * M2) performs a linear algebraic matrix multiplication
* @code(V := M1 * V1) performs a matrix * row vector linear algebraic 
  multiplication
* @code(V := V1 * M1) performs a column vector * matrix linear algebraic 
  multiplication

To multiply matrices component-wise, you can use the @code(CompMult) method.

@section(1 _Interop Interoperability with the Delphi RTL)
FastMath provides its own vector and matrix types for superior performance. Most 
of them are equivalent in functionality and data storage to the Delphi RTL 
types. You can typecast between them or implicitly convert from the FastMath 
type to the RTL type or vice versa (eg. @code(MyVector2 := MyPointF)). The
following table shows the mapping:

@table(
@rowHead(@cell(Purpose)         @cell(FastMath)    @cell(Delphi RTL) )
@row(    @cell(2D point/vector) @cell(TVector2)    @cell(TPointF)    )
@row(    @cell(3D point/vector) @cell(TVector3)    @cell(TPoint3D)   )
@row(    @cell(4D point/vector) @cell(TVector4)    @cell(TVector3D)  )
@row(    @cell(2x2 matrix)      @cell(TMatrix2)    @cell(N/A)        )
@row(    @cell(3x3 matrix)      @cell(TMatrix3)    @cell(TMatrix)    )
@row(    @cell(4x4 matrix)      @cell(TMatrix4)    @cell(TMatrix3D)  )
@row(    @cell(quaternion)      @cell(TQuaternion) @cell(TQuaternion3D)  )
)

@section(1 _Functions Functions)
Below you will find a categorized list of the global functions supported by
FastMath:

@section(2 _Helpers Helper functions for creating vectors and matrices)
* Vector2: creates a 2D vector
* Vector3: creates a 3D vector
* Vector4: creates a 4D vector
* Matrix2: creates a 2x2 matrix
* Matrix3: creates a 3x3 matrix
* Matrix4: creates a 4x4 matrix
* Quaternion: creates a quaternion
* IVector2: creates a 2D integer vector
* IVector3: creates a 3D integer vector
* IVector4: creates a 4D integer vector

@section(2 _Trig Angle and Trigonometry Functions)
* Radians: converts degrees to radians
* Degrees: converts radians to degrees
* Sin: calculates a sine of an angle
* Cos: calculates a cosine of an angle
* SinCos: calculates a sine/cosine pair
* Tan: calculates the tangent of an angle
* ArcSin: calculates an arc sine
* ArcCos: calculates an arc cosine
* ArcTan: calculates an arc tangent
* ArcTan2: calculates an arctangent angle and quadrant
* Sinh: calculates a hyperbolic sine
* Cosh: calculates a hyperbolic cosine
* Tanh: calculates a hyperbolic tangent
* ArcSinh: calculates an inverse hyperbolic sine
* ArcCosh: calculates an inverse hyperbolic cosine
* ArcTanh: calculates an inverse hyperbolic tangent

@section(2 _Exponential Exponential Functions)
* Power: raises a base to a power
* Exp: calculates a natural exponentiation (that is, e raised to a given power)
* Ln: calculates a natural logarithm
* Exp2: calculates 2 raised to a power
* Log2: calculates a base 2 logarithm
* Sqrt: calculates a square root
* InverseSqrt: calculates an inverse square root

@section(2 _Approx Fast Approximate Functions)
* FastSin: fast sine function
* FastCos: fast cosine function
* FastSinCos: fast sine/cosine function
* FastTan: fast tangent function
* FastArcTan2: fast arctangent angle and quadrant
* FastPower: fast power function
* FastExp: fast natural exponentiation function
* FastLn: fast natural logarithm function
* FastLog2: fast base 2 logarithm function
* FastExp2: fast Exp2 function

@section(2 _Common Common Functions)
* Abs: calculates an absolute value
* Sign: calculates the sign of a value
* Floor: rounds a value towards negative infinity
* Trunc: rounds a value towards 0
* Round: rounds a value towards its nearest integer
* Ceil: rounds a value towards positive infinity
* Frac: returns the fractional part of a number
* FMod: calculates the remainder of a floating-point division
* ModF: splits a floating-point value into its integer and fractional parts
* Min: calculates the minimum of two values
* Max: calculates the maximum of two values
* EnsureRange: clamps a given value into a range
* Mix: calculates a linear blend between two values, using on a progress value
* Step: step function
* SmoothStep: performs smooth Hermite interpolation between 0 and 1
* FMA: Fused Multiply and Add

@section(2 _Matrix Matrix Functions)
* OuterProduct: multiplies a column vector with a row vector

@section(2 _Config Configuration Functions)
* DisableFloatingPointExceptions: disables floating-point exceptions
* RestoreFloatingPointExceptions: restore the floating-point exception flags