# Newton's Method

In optimization, Newton's method from numerical analysis is used for finding the roots of the derivative of a function and thereby discovering its local extrema.

For a function $$f: \mathbb{R}^n \to \mathbb{R}$$, starting from an initial point $$\mathbf{x}_0$$, the method works by computing succsessive points in the function domain

$\mathbf{x}_{n + 1} = \mathbf{x}_n - \eta \left(\mathbf{H}_f\right)_{\mathbf{x}_n}^{-1} \left( \nabla f \right)_{\mathbf{x}_n} \; ,$

where $$\eta > 0$$ is the step size, $$\left(\mathbf{H}_f\right)_{\mathbf{x}_n}^{-1}$$ is the inverse of the Hessian of $$f$$ evaluated at $$\mathbf{x}_n$$, and $$\left( \nabla f \right)_{\mathbf{x}_n}$$ is the gradient of $$f$$ evaluated at $$\mathbf{x}_n$$.

Newton's method converges faster than gradient descent, but this comes at the cost of computing the Hessian of the function at each iteration. In practice, the Hessian is usually only approximated from the changes in the gradient, giving rise to quasi-Netwon methods such as the BFGS algorithm.

Using DiffSharp, we can compute the exact Hessian efficiently via automatic differentiation. The following code implements Newton's method using the DiffSharp.AD.Float64 module, which provides the gradhessian operation returning both the gradient and the Hessian of a function at a given point using forward-on-reverse AD.

 1: 2: 3: 4: 5: 6: 7: 8: 9:  open DiffSharp.AD.Float64 // Newton's method // f: function, x0: starting point, eta: step size, epsilon: threshold let Newton f x0 (eta:D) epsilon = let rec desc x = let g, h = gradhessian f x if DV.l2norm g < epsilon then x else desc (x - eta * (DM.inverse h) * g) desc x0 

Let us find a (local) extremum of the function

$f(\mathbf{x}) = e^{x_1 - 1} + e^{-x_2 + 1} + (x_1 - x_2)^2$

around the point $$(0, 0)$$.

 1: 2: 3: 4:  let f (x:DV) = (exp (x.[0] - 1)) + (exp (- x.[1] + 1)) + ((x.[0] - x.[1]) ** 2) let xmin = Newton f (toDV [0.; 0.]) (D 1.) (D 0.001) let fxmin = f xmin 
 val xmin : DV = DV [|0.7958861818; 1.203482609|] val fxmin : D = D 1.797388803

The extremum is found as $$f(0.7958861818, 1.203482609) = 1.797388803$$.

namespace DiffSharp
module Float64

val Newton : f:(DV -> D) -> x0:DV -> eta:D -> epsilon:D -> DV

Full name: Examples-newtonsmethod.Newton
val f : (DV -> D)
val x0 : DV
val eta : D
type D =
| D of float
| DF of D * D * uint32
| DR of D * D ref * TraceOp * uint32 ref * uint32
interface IComparable
member Copy : unit -> D
override Equals : other:obj -> bool
member GetForward : t:D * i:uint32 -> D
override GetHashCode : unit -> int
member GetReverse : i:uint32 -> D
override ToString : unit -> string
member A : D
member F : uint32
member P : D
member PD : D
member T : D
member A : D with set
member F : uint32 with set
static member Abs : a:D -> D
static member Acos : a:D -> D
static member Asin : a:D -> D
static member Atan : a:D -> D
static member Atan2 : a:int * b:D -> D
static member Atan2 : a:D * b:int -> D
static member Atan2 : a:float * b:D -> D
static member Atan2 : a:D * b:float -> D
static member Atan2 : a:D * b:D -> D
static member Ceiling : a:D -> D
static member Cos : a:D -> D
static member Cosh : a:D -> D
static member Exp : a:D -> D
static member Floor : a:D -> D
static member Log : a:D -> D
static member Log10 : a:D -> D
static member LogSumExp : a:D -> D
static member Max : a:D * b:D -> D
static member Min : a:D * b:D -> D
static member Op_D_D : a:D * ff:(float -> float) * fd:(D -> D) * df:(D * D * D -> D) * r:(D -> TraceOp) -> D
static member Op_D_D_D : a:D * b:D * ff:(float * float -> float) * fd:(D * D -> D) * df_da:(D * D * D -> D) * df_db:(D * D * D -> D) * df_dab:(D * D * D * D * D -> D) * r_d_d:(D * D -> TraceOp) * r_d_c:(D * D -> TraceOp) * r_c_d:(D * D -> TraceOp) -> D
static member Pow : a:int * b:D -> D
static member Pow : a:D * b:int -> D
static member Pow : a:float * b:D -> D
static member Pow : a:D * b:float -> D
static member Pow : a:D * b:D -> D
static member ReLU : a:D -> D
static member Round : a:D -> D
static member Sigmoid : a:D -> D
static member Sign : a:D -> D
static member Sin : a:D -> D
static member Sinh : a:D -> D
static member SoftPlus : a:D -> D
static member SoftSign : a:D -> D
static member Sqrt : a:D -> D
static member Tan : a:D -> D
static member Tanh : a:D -> D
static member One : D
static member Zero : D
static member ( + ) : a:int * b:D -> D
static member ( + ) : a:D * b:int -> D
static member ( + ) : a:float * b:D -> D
static member ( + ) : a:D * b:float -> D
static member ( + ) : a:D * b:D -> D
static member ( / ) : a:int * b:D -> D
static member ( / ) : a:D * b:int -> D
static member ( / ) : a:float * b:D -> D
static member ( / ) : a:D * b:float -> D
static member ( / ) : a:D * b:D -> D
static member op_Explicit : d:D -> float
static member ( * ) : a:int * b:D -> D
static member ( * ) : a:D * b:int -> D
static member ( * ) : a:float * b:D -> D
static member ( * ) : a:D * b:float -> D
static member ( * ) : a:D * b:D -> D
static member ( - ) : a:int * b:D -> D
static member ( - ) : a:D * b:int -> D
static member ( - ) : a:float * b:D -> D
static member ( - ) : a:D * b:float -> D
static member ( - ) : a:D * b:D -> D
static member ( ~- ) : a:D -> D

val epsilon : D
val desc : (DV -> DV)
val x : DV
val g : DV
val h : DM
val gradhessian : f:(DV -> D) -> x:DV -> DV * DM

Multiple items
union case DV.DV: float [] -> DV

--------------------
module DV

--------------------
type DV =
| DV of float []
| DVF of DV * DV * uint32
| DVR of DV * DV ref * TraceOp * uint32 ref * uint32
member Copy : unit -> DV
member GetForward : t:DV * i:uint32 -> DV
member GetReverse : i:uint32 -> DV
member GetSlice : lower:int option * upper:int option -> DV
member ToArray : unit -> D []
member ToColDM : unit -> DM
member ToMathematicaString : unit -> string
member ToMatlabString : unit -> string
member ToRowDM : unit -> DM
override ToString : unit -> string
member Visualize : unit -> string
member A : DV
member F : uint32
member Item : i:int -> D with get
member Length : int
member P : DV
member PD : DV
member T : DV
member A : DV with set
member F : uint32 with set
static member Abs : a:DV -> DV
static member Acos : a:DV -> DV
static member AddItem : a:DV * i:int * b:D -> DV
static member AddSubVector : a:DV * i:int * b:DV -> DV
static member Append : a:DV * b:DV -> DV
static member Asin : a:DV -> DV
static member Atan : a:DV -> DV
static member Atan2 : a:int * b:DV -> DV
static member Atan2 : a:DV * b:int -> DV
static member Atan2 : a:float * b:DV -> DV
static member Atan2 : a:DV * b:float -> DV
static member Atan2 : a:D * b:DV -> DV
static member Atan2 : a:DV * b:D -> DV
static member Atan2 : a:DV * b:DV -> DV
static member Ceiling : a:DV -> DV
static member Cos : a:DV -> DV
static member Cosh : a:DV -> DV
static member Exp : a:DV -> DV
static member Floor : a:DV -> DV
static member L1Norm : a:DV -> D
static member L2Norm : a:DV -> D
static member L2NormSq : a:DV -> D
static member Log : a:DV -> DV
static member Log10 : a:DV -> DV
static member LogSumExp : a:DV -> D
static member Max : a:DV -> D
static member Max : a:D * b:DV -> DV
static member Max : a:DV * b:D -> DV
static member Max : a:DV * b:DV -> DV
static member MaxIndex : a:DV -> int
static member Mean : a:DV -> D
static member Min : a:DV -> D
static member Min : a:D * b:DV -> DV
static member Min : a:DV * b:D -> DV
static member Min : a:DV * b:DV -> DV
static member MinIndex : a:DV -> int
static member Normalize : a:DV -> DV
static member OfArray : a:D [] -> DV
static member Op_DV_D : a:DV * ff:(float [] -> float) * fd:(DV -> D) * df:(D * DV * DV -> D) * r:(DV -> TraceOp) -> D
static member Op_DV_DM : a:DV * ff:(float [] -> float [,]) * fd:(DV -> DM) * df:(DM * DV * DV -> DM) * r:(DV -> TraceOp) -> DM
static member Op_DV_DV : a:DV * ff:(float [] -> float []) * fd:(DV -> DV) * df:(DV * DV * DV -> DV) * r:(DV -> TraceOp) -> DV
static member Op_DV_DV_D : a:DV * b:DV * ff:(float [] * float [] -> float) * fd:(DV * DV -> D) * df_da:(D * DV * DV -> D) * df_db:(D * DV * DV -> D) * df_dab:(D * DV * DV * DV * DV -> D) * r_d_d:(DV * DV -> TraceOp) * r_d_c:(DV * DV -> TraceOp) * r_c_d:(DV * DV -> TraceOp) -> D
static member Op_DV_DV_DM : a:DV * b:DV * ff:(float [] * float [] -> float [,]) * fd:(DV * DV -> DM) * df_da:(DM * DV * DV -> DM) * df_db:(DM * DV * DV -> DM) * df_dab:(DM * DV * DV * DV * DV -> DM) * r_d_d:(DV * DV -> TraceOp) * r_d_c:(DV * DV -> TraceOp) * r_c_d:(DV * DV -> TraceOp) -> DM
static member Op_DV_DV_DV : a:DV * b:DV * ff:(float [] * float [] -> float []) * fd:(DV * DV -> DV) * df_da:(DV * DV * DV -> DV) * df_db:(DV * DV * DV -> DV) * df_dab:(DV * DV * DV * DV * DV -> DV) * r_d_d:(DV * DV -> TraceOp) * r_d_c:(DV * DV -> TraceOp) * r_c_d:(DV * DV -> TraceOp) -> DV
static member Op_DV_D_DV : a:DV * b:D * ff:(float [] * float -> float []) * fd:(DV * D -> DV) * df_da:(DV * DV * DV -> DV) * df_db:(DV * D * D -> DV) * df_dab:(DV * DV * DV * D * D -> DV) * r_d_d:(DV * D -> TraceOp) * r_d_c:(DV * D -> TraceOp) * r_c_d:(DV * D -> TraceOp) -> DV
static member Op_D_DV_DV : a:D * b:DV * ff:(float * float [] -> float []) * fd:(D * DV -> DV) * df_da:(DV * D * D -> DV) * df_db:(DV * DV * DV -> DV) * df_dab:(DV * D * D * DV * DV -> DV) * r_d_d:(D * DV -> TraceOp) * r_d_c:(D * DV -> TraceOp) * r_c_d:(D * DV -> TraceOp) -> DV
static member Pow : a:int * b:DV -> DV
static member Pow : a:DV * b:int -> DV
static member Pow : a:float * b:DV -> DV
static member Pow : a:DV * b:float -> DV
static member Pow : a:D * b:DV -> DV
static member Pow : a:DV * b:D -> DV
static member Pow : a:DV * b:DV -> DV
static member ReLU : a:DV -> DV
static member ReshapeToDM : m:int * a:DV -> DM
static member Round : a:DV -> DV
static member Sigmoid : a:DV -> DV
static member Sign : a:DV -> DV
static member Sin : a:DV -> DV
static member Sinh : a:DV -> DV
static member SoftMax : a:DV -> DV
static member SoftPlus : a:DV -> DV
static member SoftSign : a:DV -> DV
static member Split : d:DV * n:seq<int> -> seq<DV>
static member Sqrt : a:DV -> DV
static member StandardDev : a:DV -> D
static member Standardize : a:DV -> DV
static member Sum : a:DV -> D
static member Tan : a:DV -> DV
static member Tanh : a:DV -> DV
static member Variance : a:DV -> D
static member ZeroN : n:int -> DV
static member Zero : DV
static member ( + ) : a:int * b:DV -> DV
static member ( + ) : a:DV * b:int -> DV
static member ( + ) : a:float * b:DV -> DV
static member ( + ) : a:DV * b:float -> DV
static member ( + ) : a:D * b:DV -> DV
static member ( + ) : a:DV * b:D -> DV
static member ( + ) : a:DV * b:DV -> DV
static member ( &* ) : a:DV * b:DV -> DM
static member ( / ) : a:int * b:DV -> DV
static member ( / ) : a:DV * b:int -> DV
static member ( / ) : a:float * b:DV -> DV
static member ( / ) : a:DV * b:float -> DV
static member ( / ) : a:D * b:DV -> DV
static member ( / ) : a:DV * b:D -> DV
static member ( ./ ) : a:DV * b:DV -> DV
static member ( .* ) : a:DV * b:DV -> DV
static member op_Explicit : d:float [] -> DV
static member op_Explicit : d:DV -> float []
static member ( * ) : a:int * b:DV -> DV
static member ( * ) : a:DV * b:int -> DV
static member ( * ) : a:float * b:DV -> DV
static member ( * ) : a:DV * b:float -> DV
static member ( * ) : a:D * b:DV -> DV
static member ( * ) : a:DV * b:D -> DV
static member ( * ) : a:DV * b:DV -> D
static member ( - ) : a:int * b:DV -> DV
static member ( - ) : a:DV * b:int -> DV
static member ( - ) : a:float * b:DV -> DV
static member ( - ) : a:DV * b:float -> DV
static member ( - ) : a:D * b:DV -> DV
static member ( - ) : a:DV * b:D -> DV
static member ( - ) : a:DV * b:DV -> DV
static member ( ~- ) : a:DV -> DV

val l2norm : v:DV -> D

Multiple items
union case DM.DM: float [,] -> DM

--------------------
module DM

--------------------
type DM =
| DM of float [,]
| DMF of DM * DM * uint32
| DMR of DM * DM ref * TraceOp * uint32 ref * uint32
member Copy : unit -> DM
member GetCols : unit -> seq<DV>
member GetForward : t:DM * i:uint32 -> DM
member GetReverse : i:uint32 -> DM
member GetRows : unit -> seq<DV>
member GetSlice : rowStart:int option * rowFinish:int option * col:int -> DV
member GetSlice : row:int * colStart:int option * colFinish:int option -> DV
member GetSlice : rowStart:int option * rowFinish:int option * colStart:int option * colFinish:int option -> DM
member ToMathematicaString : unit -> string
member ToMatlabString : unit -> string
override ToString : unit -> string
member Visualize : unit -> string
member A : DM
member Cols : int
member F : uint32
member Item : i:int * j:int -> D with get
member Length : int
member P : DM
member PD : DM
member Rows : int
member T : DM
member A : DM with set
member F : uint32 with set
static member Abs : a:DM -> DM
static member Acos : a:DM -> DM
static member AddDiagonal : a:DM * b:DV -> DM
static member AddItem : a:DM * i:int * j:int * b:D -> DM
static member AddSubMatrix : a:DM * i:int * j:int * b:DM -> DM
static member Asin : a:DM -> DM
static member Atan : a:DM -> DM
static member Atan2 : a:int * b:DM -> DM
static member Atan2 : a:DM * b:int -> DM
static member Atan2 : a:float * b:DM -> DM
static member Atan2 : a:DM * b:float -> DM
static member Atan2 : a:D * b:DM -> DM
static member Atan2 : a:DM * b:D -> DM
static member Atan2 : a:DM * b:DM -> DM
static member Ceiling : a:DM -> DM
static member Cos : a:DM -> DM
static member Cosh : a:DM -> DM
static member Det : a:DM -> D
static member Diagonal : a:DM -> DV
static member Exp : a:DM -> DM
static member Floor : a:DM -> DM
static member Inverse : a:DM -> DM
static member Log : a:DM -> DM
static member Log10 : a:DM -> DM
static member Max : a:DM -> D
static member Max : a:D * b:DM -> DM
static member Max : a:DM * b:D -> DM
static member Max : a:DM * b:DM -> DM
static member MaxIndex : a:DM -> int * int
static member Mean : a:DM -> D
static member Min : a:DM -> D
static member Min : a:D * b:DM -> DM
static member Min : a:DM * b:D -> DM
static member Min : a:DM * b:DM -> DM
static member MinIndex : a:DM -> int * int
static member Normalize : a:DM -> DM
static member OfArray : m:int * a:D [] -> DM
static member OfArray2D : a:D [,] -> DM
static member OfCols : n:int * a:DV -> DM
static member OfRows : s:seq<DV> -> DM
static member OfRows : m:int * a:DV -> DM
static member Op_DM_D : a:DM * ff:(float [,] -> float) * fd:(DM -> D) * df:(D * DM * DM -> D) * r:(DM -> TraceOp) -> D
static member Op_DM_DM : a:DM * ff:(float [,] -> float [,]) * fd:(DM -> DM) * df:(DM * DM * DM -> DM) * r:(DM -> TraceOp) -> DM
static member Op_DM_DM_DM : a:DM * b:DM * ff:(float [,] * float [,] -> float [,]) * fd:(DM * DM -> DM) * df_da:(DM * DM * DM -> DM) * df_db:(DM * DM * DM -> DM) * df_dab:(DM * DM * DM * DM * DM -> DM) * r_d_d:(DM * DM -> TraceOp) * r_d_c:(DM * DM -> TraceOp) * r_c_d:(DM * DM -> TraceOp) -> DM
static member Op_DM_DV : a:DM * ff:(float [,] -> float []) * fd:(DM -> DV) * df:(DV * DM * DM -> DV) * r:(DM -> TraceOp) -> DV
static member Op_DM_DV_DM : a:DM * b:DV * ff:(float [,] * float [] -> float [,]) * fd:(DM * DV -> DM) * df_da:(DM * DM * DM -> DM) * df_db:(DM * DV * DV -> DM) * df_dab:(DM * DM * DM * DV * DV -> DM) * r_d_d:(DM * DV -> TraceOp) * r_d_c:(DM * DV -> TraceOp) * r_c_d:(DM * DV -> TraceOp) -> DM
static member Op_DM_DV_DV : a:DM * b:DV * ff:(float [,] * float [] -> float []) * fd:(DM * DV -> DV) * df_da:(DV * DM * DM -> DV) * df_db:(DV * DV * DV -> DV) * df_dab:(DV * DM * DM * DV * DV -> DV) * r_d_d:(DM * DV -> TraceOp) * r_d_c:(DM * DV -> TraceOp) * r_c_d:(DM * DV -> TraceOp) -> DV
static member Op_DM_D_DM : a:DM * b:D * ff:(float [,] * float -> float [,]) * fd:(DM * D -> DM) * df_da:(DM * DM * DM -> DM) * df_db:(DM * D * D -> DM) * df_dab:(DM * DM * DM * D * D -> DM) * r_d_d:(DM * D -> TraceOp) * r_d_c:(DM * D -> TraceOp) * r_c_d:(DM * D -> TraceOp) -> DM
static member Op_DV_DM_DM : a:DV * b:DM * ff:(float [] * float [,] -> float [,]) * fd:(DV * DM -> DM) * df_da:(DM * DV * DV -> DM) * df_db:(DM * DM * DM -> DM) * df_dab:(DM * DV * DV * DM * DM -> DM) * r_d_d:(DV * DM -> TraceOp) * r_d_c:(DV * DM -> TraceOp) * r_c_d:(DV * DM -> TraceOp) -> DM
static member Op_DV_DM_DV : a:DV * b:DM * ff:(float [] * float [,] -> float []) * fd:(DV * DM -> DV) * df_da:(DV * DV * DV -> DV) * df_db:(DV * DM * DM -> DV) * df_dab:(DV * DV * DV * DM * DM -> DV) * r_d_d:(DV * DM -> TraceOp) * r_d_c:(DV * DM -> TraceOp) * r_c_d:(DV * DM -> TraceOp) -> DV
static member Op_D_DM_DM : a:D * b:DM * ff:(float * float [,] -> float [,]) * fd:(D * DM -> DM) * df_da:(DM * D * D -> DM) * df_db:(DM * DM * DM -> DM) * df_dab:(DM * D * D * DM * DM -> DM) * r_d_d:(D * DM -> TraceOp) * r_d_c:(D * DM -> TraceOp) * r_c_d:(D * DM -> TraceOp) -> DM
static member Pow : a:int * b:DM -> DM
static member Pow : a:DM * b:int -> DM
static member Pow : a:float * b:DM -> DM
static member Pow : a:DM * b:float -> DM
static member Pow : a:D * b:DM -> DM
static member Pow : a:DM * b:D -> DM
static member Pow : a:DM * b:DM -> DM
static member ReLU : a:DM -> DM
static member ReshapeToDV : a:DM -> DV
static member Round : a:DM -> DM
static member Sigmoid : a:DM -> DM
static member Sign : a:DM -> DM
static member Sin : a:DM -> DM
static member Sinh : a:DM -> DM
static member SoftPlus : a:DM -> DM
static member SoftSign : a:DM -> DM
static member Solve : a:DM * b:DV -> DV
static member SolveSymmetric : a:DM * b:DV -> DV
static member Sqrt : a:DM -> DM
static member StandardDev : a:DM -> D
static member Standardize : a:DM -> DM
static member Sum : a:DM -> D
static member Tan : a:DM -> DM
static member Tanh : a:DM -> DM
static member Trace : a:DM -> D
static member Transpose : a:DM -> DM
static member Variance : a:DM -> D
static member ZeroMN : m:int -> n:int -> DM
static member Zero : DM
static member ( + ) : a:int * b:DM -> DM
static member ( + ) : a:DM * b:int -> DM
static member ( + ) : a:float * b:DM -> DM
static member ( + ) : a:DM * b:float -> DM
static member ( + ) : a:DM * b:DV -> DM
static member ( + ) : a:DV * b:DM -> DM
static member ( + ) : a:D * b:DM -> DM
static member ( + ) : a:DM * b:D -> DM
static member ( + ) : a:DM * b:DM -> DM
static member ( / ) : a:int * b:DM -> DM
static member ( / ) : a:DM * b:int -> DM
static member ( / ) : a:float * b:DM -> DM
static member ( / ) : a:DM * b:float -> DM
static member ( / ) : a:D * b:DM -> DM
static member ( / ) : a:DM * b:D -> DM
static member ( ./ ) : a:DM * b:DM -> DM
static member ( .* ) : a:DM * b:DM -> DM
static member op_Explicit : d:float [,] -> DM
static member op_Explicit : d:DM -> float [,]
static member ( * ) : a:int * b:DM -> DM
static member ( * ) : a:DM * b:int -> DM
static member ( * ) : a:float * b:DM -> DM
static member ( * ) : a:DM * b:float -> DM
static member ( * ) : a:D * b:DM -> DM
static member ( * ) : a:DM * b:D -> DM
static member ( * ) : a:DV * b:DM -> DV
static member ( * ) : a:DM * b:DV -> DV
static member ( * ) : a:DM * b:DM -> DM
static member ( - ) : a:int * b:DM -> DM
static member ( - ) : a:DM * b:int -> DM
static member ( - ) : a:float * b:DM -> DM
static member ( - ) : a:DM * b:float -> DM
static member ( - ) : a:D * b:DM -> DM
static member ( - ) : a:DM * b:D -> DM
static member ( - ) : a:DM * b:DM -> DM
static member ( ~- ) : a:DM -> DM

val inverse : m:DM -> DM

val f : x:DV -> D

Full name: Examples-newtonsmethod.f
val exp : value:'T -> 'T (requires member Exp)

Full name: Microsoft.FSharp.Core.Operators.exp
val xmin : DV

Full name: Examples-newtonsmethod.xmin
val toDV : v:seq<'a> -> DV (requires member op_Explicit)