** **
17 Dec 2021

17 Dec 2021

# Using analog computers in today's largest computational challenges

Sven Köppel Bernd Ulmann Lars Heimann and Dirk Killat

^{1},

^{1},

^{1},

^{1,2}

**Sven Köppel et al.**Sven Köppel Bernd Ulmann Lars Heimann and Dirk Killat

^{1},

^{1},

^{1},

^{1,2}

^{1}Anabrid GmbH, Am Stadtpark 3, 12167 Berlin, Germany^{2}Microelectronics Department, Brandenburg University of Technology, 03046 Cottbus, Germany

^{1}Anabrid GmbH, Am Stadtpark 3, 12167 Berlin, Germany^{2}Microelectronics Department, Brandenburg University of Technology, 03046 Cottbus, Germany

**Correspondence**: Sven Köppel (koeppel@anabrid.com)

**Correspondence**: Sven Köppel (koeppel@anabrid.com)

Received: 15 Feb 2021 – Revised: 30 Jun 2021 – Accepted: 07 Jul 2021 – Published: 17 Dec 2021

Analog computers can be revived as a feasible technology platform for low precision, energy efficient and fast computing. We justify this statement by measuring the performance of a modern analog computer and comparing it with that of traditional digital processors. General statements are made about the solution of ordinary and partial differential equations. Computational fluid dynamics are discussed as an example of large scale scientific computing applications. Several models are proposed which demonstrate the benefits of analog and digital-analog hybrid computing.

- Article
(738 KB) -
Supplement
(58 KB) - BibTeX
- EndNote

Digital computing has transformed many — if not close to all — aspects
of industry, humanities and science. Turing completeness allows
statements to be made about the computability and decidability of problems and *computational power* of machines. Digital storage has undergone numerous technological
advances and is available in increasingly vast amounts. Nevertheless, contemporary digital
computing is possibly not the last word in computing, despite its dominance in the consumer
market for the last 40+ years.

Fundamental research about non-traditional (also referred to as *unconventional*
or *exotic*) computing is taking place in material
sciences, chemistry but also in more exotic branches such as biology and life sciences.
Amongst others, beyond-Turing computing (Siegelmann, 1995),
natural computing (Calude et al., 1999),
neuromorphic computing (Schuman et al., 2017; Ziegler, 2020) or
quantum computing (Zhou et al., 2020; Georgescu et al., 2014; Kendon et al., 2010) are fields of active
investigation. Being fundamental research at heart, these disciplines come with technological challenges.
For instance, computing with DNA still requires the use of large scale laboratory
equipment and machinery
(Deaton et al., 1998).
Currently, not only the low-temperature laboratory conditions but also the
necessary error correction schemes challenge practical quantum computers
(Wilhelm et al., 2017).
This currently negates any practical advantage over silicon based digital computing.
Furthermore,
all of these alternative (or *exotic*) computer architectures share
the characteristic that they are fundamentally *non-portable*. This means they will
have to be located at large facilities and dedicated special-purpose computing centers for a
long time, if not forever. This is not necessarily a practical drawback,
since the internet allows for delocalization of systems.

In contrast to this, silicon based *electronic analog computing* is a technology with
a rich history,
which operates in a normal workplace environment
(non-laboratory conditions; Ulmann, 2020).
Digital computers overtook their analog counterparts in the last century,
primarily due to their ever-increasing digital clock speeds and their flexibility that
comes from their algorithmic approach and the possibility of using these machines in
a time-shared environment. However, today Moore's law is coming to a hard
stop and processor clock speeds have not significantly increased in the past decade. Manycore
architectures and vectorization come with their own share of problems, given their
fundamental limits as described, for instance, by Amdahl's law (Rodgers, 1985).
GPGPUs and specialized digital computing chips concentrate on
vectorized, and even data flow-oriented programming paradigms but are still limited by
parasitic capacitances which determine the maximum possible clock frequency and provide a
noticeable *energy barrier*.

Thanks to their properties, analog computers have attracted the interest of many research groups. For surveys of theory and applications, see for instance Bournez and Pouly (2021) or the works of MacLennan (2004, 2012, 2019). In this paper, we study the usability of analog computers for applications in science. The fundamental properties of analog computers are low power requirements, low resolution computation and intrinsic parallelism. Two very different uses cases/scenarios can be identified: High performance computing (HPC) and low energy portable computing. The energy and computational demands for both scenarios are diametrically-opposed and this paper is primarily focused on HPC.

The paper is structured as follows: In Sect. 2, we review the general assumptions about digital and analog computing. In Sect. 3, small scale benchmark results are presented for a simple ordinary differential equation. In Sect. 4, a typical partial differential equation is considered as an example for a large scale problem. Spatial discretization effects and computer architecture design choices are discussed. Finally, Sect. 5 summarizes the findings.

In this paper, we study different techniques for solving differential equations computationally.
Due to the different conventions in algorithmic and analog approaches, a common language
had to be found and is described in this section. Here, the term *algorithmic approach* addresses the classical Euler method or classical quasi-linear techniques
in ordinary or partial differential equations (ODEs/PDEs), *i.e.*, general methods of numerical mathematics.
The term *analog approach* addresses the continuous time integration with
an operational amplifier having a capacitor in the feedback loop.
The fundamental measures of computer performance under consideration are the
time-to-solution *T*, the power consumption *P* and the energy demand *E*.

## 2.1 Time to solution

The time-to-solution *T* is the elapsed real time (lab time or wall clock time) for solving a
differential equation ${\partial}_{\mathrm{t}}u=f\left(u\right)$ from its initial condition *u*(*t*_{0}) at time
*t*_{0} to some target simulation time *t*_{final},
*i.e.*, for obtaining *u*(*t*_{final}). The *speed factor* ${k}_{\mathrm{0}}:=T/{t}_{\mathrm{final}}$ is the ratio of
elapsed simulation time per wall clock time.
On analog computers, this allows to identify the *maximum frequency* $\mathit{\nu}={k}_{\mathrm{0}}/\left(\mathrm{2}\mathit{\pi}\phantom{\rule{0.125em}{0ex}}\mathrm{s}\right)$.
On digital computers, the time-to-solution is used as an estimator (in a statistical sense) for the average *k*_{0}.
Relating this quantity to measures in numerical schemes is an important discussion point in this paper.
Given the simplest possible ODE,

one can study the analog/digital
computer performance in terms of the *complexity* of *f*(*y*).
For a problem *M* times as big as the given one, the inherently fully *parallel* analog computer
exhibits a constant time-to-solution, *i.e.*, in other terms,

In contrast,
a single core (*i.e.*, nonvectorized, nor superscalar architecture) digital computer
operates in a *serial* fashion and can achieve a time-to-solution

Here, *T*^{1} refers to the time-to-solution for solving Eq. (1), while
*T*^{M} refers to the time-to-solution for solving a problem *M* times as hard.
*M*∈ℕ is the measure for the algorithmic complexity of *f*(*y*).
*f*(*M*)=𝒪(*g*(*M*)) refers to the Bachmann-Landau asymptotic notation.
The number of computational elements required to implement *f*(*y*) on an analog computer or
the number of instructions required for computing *f*(*y*) on a digital computer could provide
numbers for *M*.
This is because it is assumed that the evaluation of *f*(*y*) can hardly be numerically parallelized.
For a system of *N* coupled ODEs $\mathrm{d}{y}_{i}/\mathrm{d}t={f}_{i}({y}_{\mathrm{1}},\mathrm{\dots},{y}_{N})$, the vector-valued
** f** can be assigned an effective complexity 𝒪(

*N*

*M*) with the same reasoning. However, an overall complexity 𝒪(

*M*) is more realistic since parallelism could be exploited more easily in the direction of

*N*(MIMD, multiple instruction, multiple data).

Furthermore, multi-step schemes
implementing higher order numerical time integration can exploit digital parallelization
(however, in general the serial time-to-solution of a
numerical Euler scheme is the limit for the fastest possible digital time integration).
Digital parallelization is always limited by the inherently serial parts of a problem
(Amdahl's law, Rodgers, 1985),
which makes the evaluation of *f*(*y*) the hardest part of the problem. Section 4
discusses complex functions *f*(*y*) in the context of the *method of lines* for PDEs.

It should be emphasized that, in the general case, this estimate for the digital computer is
a most optimistic (best) estimate, using today's numerical methods.
It does not take into account hypothetical algorithmic
“shortcuts” which could archive solutions faster than 𝒪(*M*), because they imply some
knowledge about the internal structure of *f*(*y*) which could probably also be exploited in analog
implementations.

## 2.2 Power and energy scaling for the linear model

For a given problem with time-to-solution *T* and average power consumption *P*, the overall energy
is estimated by *E*=*P**T* regardless of the computer architecture.

In general, an analog computer has to grow with the problem size *M*. Given constant power requirements per
computing element and neglecting increasing resistances or parasitic capacitances, in general one can
assume the analog computer power requirement ${P}_{A}^{M}$ for a size *M* problem to scale from a size 1
problem ${P}_{A}^{\mathrm{1}}$ as ${P}_{A}^{M}={P}_{A}^{\mathrm{1}}\cdot M$. In contrast,
a serial single node digital computer
in principle can compute a problem of any size serially by relying on dynamic memory (DRAM),
*i.e.*, ${P}_{\mathrm{D}}^{M}={P}_{\mathrm{D}}^{\mathrm{1}}$. That is, the digital computer
power requirements for running a large problem (${P}_{\mathrm{D}}^{M}$) are (at first approximation)
similar to running a small problem ${P}_{\mathrm{D}}^{\mathrm{1}}$. Typically,
the DRAM energy demands are one to two orders of magnitude smaller than those of a desktop or
server grade processor and are therefore negligible for this estimate.

Interestingly, this model suggests that the overall energy requirements to solve a large
problem on an analog and digital computer, respectively, are both ${E}_{\mathrm{D}}^{M}$ and ${E}_{A}^{M}=\mathcal{O}\left(M\right)$,
*i.e.*, the analog-digital energy ratio remains constant despite the fact that the analog
computer computes (runs) linearly faster with increasing problem size *M*. This can be easily
deduced by $E=P\cdot T$. In this model, it is furthermore

The *orthogonal* performance features of the fully-parallel analog computer
and the fully-serial digital computer are also summarized in Table 1.

When comparing digital and analog computer power consumption, the power consumption under consideration should
include the *total* computer power including administrative parts (like network infrastructure,
analog-to-digital converters or cooling) and power supplies. In this work, data of heterogenous sources are compared
and definitions may vary.

## 2.3 Criticism and outlook

Given that the digital and analog technology (electric representation of information, transistor-based
computation) is quite similar, the model prediction of a similarly growing energy demand is useful.
Differences are of course hidden in the constants (prefactors) of the asymptotic notation 𝒪(*M*).
Quantitative studies in the next sections examine this prefactor in 𝒪(*M*).

The linear model is already limited in the case of serial digital processors when the computation gets
*memory bound* (instead of *CPU-bound*). Having to wait for data leads to a performance drop and might
result in a worsened superlinear ${T}_{\mathrm{D}}^{M}$.

*Parallel digital* computing as well as *serial analog* computing
has not yet been subject of the previous discussion.
While the first one is a widespread standard technique, the second
one refers to analog-digital hybrid computing which, inter alia, allows a small analog
computer to be used repeatedly on a large problem, effectively rendering the analog part as an analog
accelerator or co-processor for the digital part. Parallel digital computing suffers from a theoretical
speedup limited due to the non-parallel parts of the algorithm
(see also Gustafson, 1988),
which has exponential impact on ${T}_{\mathrm{D}}^{M}$. This is where the intrinsically parallel analog computer exhibits
its biggest advantages. Section 4 discusses this aspect of analog computing.

In this section, quantitative measurements between contemporary analog
and digital computers will be made. We use the *Analog Paradigm Model-1*
computer (Ulmann, 2019, 2020), a modern modular academic analog computer and an ordinary
Intel© Whiskey Lake “ultra-low power mobile” processor (Core i7-8565U) as a
representative of a typical desktop-grade processor.
Within this experiment, we solve a simple^{1} test equation
${\mathrm{d}}^{\mathrm{2}}y/\mathrm{d}{t}^{\mathrm{2}}=\mathit{\lambda}y$ (with real-valued *y* and
$\mathit{\lambda}=\pm \mathrm{1}$) on both a digital and analog computer.

## 3.1 Time to solution

The digital computer solved the simple ordinary differential equation (ODE)
with simple text-book level scalar benchmark codes written in C and Fortran
and compiled with GCC.
Explicit (forward) integrator methods are adopted (Euler/Runge-Kutta).
The algorithm computed
$N=\mathrm{2}\times {\mathrm{10}}^{\mathrm{3}}$ timesteps with timestep size $\mathrm{\Delta}t=\mathrm{5}\times {\mathrm{10}}^{-\mathrm{4}}$ each
(see also Sect. 4 for a motivation for this time step size).
Therefore, it is ${t}_{\mathrm{final}}=N\mathrm{\Delta}t=\mathrm{1}$. No output^{2} was written during the benchmark to ensure
the best performance.
The time per element update (per integration step) was roughly (45±35) ns.
For statistical reasons, the computation was repeated and averaged 10^{5} times.
Depending on the order of the integration scheme, the overall
wall clock time was determined as ${T}_{\mathrm{D}}=(\mathrm{75}\pm \mathrm{45})$ µs in order to
achieve the simulation time *t*_{final}.

In contrast, the equation was implemented with integrating (and negating, if $\mathit{\lambda}=-\mathrm{1}$) operational
amplifiers on the *Analog Paradigm Model-1*. The machine approached
*t*_{final}=1 in a wall-clock time ${T}_{A}=\mathrm{1}\phantom{\rule{0.125em}{0ex}}\mathrm{s}/{k}_{\mathrm{0}}$ with
${k}_{\mathrm{0}}\in \mathit{\{}\mathrm{1},\mathrm{10},{\mathrm{10}}^{\mathrm{2}},{\mathrm{10}}^{\mathrm{3}},{\mathrm{10}}^{\mathrm{4}}\mathit{\}}$ the available integration speed factors on the machine
(Ulmann, 2019).
The *Analog Paradigm Model-1* reached the solution of ${y}^{\prime \prime}=y$ at *t*_{final}=1
in a wall-clock time *T*_{A}=100 µs at best.

Note how ${T}_{A}/{T}_{\mathrm{D}}\approx \mathrm{1}$, *i.e.*, in the case of the
smallest possible reasonable ODE, the digital computer (2020s energy efficient desktop
processor) is roughly as fast as the *Analog Paradigm Model-1*
(modern analog computer with an integration level comparable to the 1970s).

Looking forward, given the limited increase in clock frequency, with a faster processor one can
probably expect an improvement of *T*_{D} down to the order of 1 µs.
For an analog computer on a chip, one can
expect an improvement of *T*_{A} down to the order of 1 µs–10 ns.
This renders ${T}_{A}/{T}_{\mathrm{D}}\approx {\mathrm{10}}^{-(\mathrm{1}\pm \mathrm{1})}$ as a universal constant.

Summing up, with the given numbers above, as soon as the problem complexity grows, the analog computer outperforms the digital one, and this advantage increases linearly.

## 3.2 Energy and power consumption

The performance measure codes
`likwid`

(Hager et al., 2010; Röhl et al., 2017; Gruber et al., 2020)
and `perf`

(de Melo, 2010)
were used in order to measure
the overall floating-point operations (FLOP) and energy usage of the digital processor.
For the Intel mobile processor, this provided a power consumption of *P*_{D}=10 W during computing.
This number was derived directly from the CPU performance counters.
The overall energy requirement was then ${E}_{\mathrm{D}}={P}_{\mathrm{D}}{T}_{\mathrm{D}}=(\mathrm{0.9}\pm \mathrm{0.6})$ mJ.
Note that this number only takes the processor energy demands into account, not any other auxiliary
parts of the overall digital computer (such as memory, main board or power supply). For the overall
power consumption, an increase of at least 50 % is expected.

The analog computer energy consumption is estimated as *P*_{A}≈400 mW.
The number is based on measurements of actual *Analog Paradigm Model-1* computing units,
in particular 84 mW for a single
summer and 162 mW for a single integrator. The overall energy requirement is then
${E}_{A}={P}_{A}{T}_{A}=\mathrm{40}\phantom{\rule{0.125em}{0ex}}$µJ.

Note that ${P}_{\mathrm{D}}/{P}_{A}\approx \mathrm{25}$, while ${E}_{\mathrm{D}}/{E}_{A}\approx (\mathrm{2.25}\pm \mathrm{1.5})$. The conclusion is that the analog and digital computer require a similar amount of energy for the given computation, a remarkable result given the 40-year technology gap between the two architectures compared here.

For power consumption, it is hard to give a useful projection due to the accumulating administrative
overhead in case of parallel digital computing, such as data transfers, non-uniform memory
accesses (NUMA) and switching networking infrastructure. It can be assumed that this will change
the ratio ${E}_{\mathrm{D}}/{E}_{A}$ further in favor of the analog computer for both larger digital and
analog computers. Furthermore, higher integration levels lower *E*_{A}: the
*Analog Paradigm Model-1* analog computer is realized with an integration level comparable
with 1970s digital computers. We can reasonably expect a drop of two to three orders of
magnitude in power requirements with fully integrated analog computers.

## 3.3 Measuring computational power: FLOP per Joule

For the digital computer, the number of computed *floating-point operations* (FLOP^{3}) can be measured.
The overall single core nonvectorized performance was measured as $F\approx \mathrm{1}\phantom{\rule{0.125em}{0ex}}\mathrm{GFLOP}/\mathrm{s}$.
A single computation until *t*_{final} required roughly *F*_{D}=3 kFLOP.
The ratio ${F}_{\mathrm{D}}/{P}_{\mathrm{D}}=\mathrm{100}\phantom{\rule{0.125em}{0ex}}\mathrm{MFLOP}/\mathrm{J}$ is a measure of the number of computations per
energy unit on this machine. This performance was one to two orders less than typical HPC numbers.
This is because an energy-saving desktop CPU and not a high-end processor was benchmarked.
Furthermore, this benchmark was by purpose single-threaded.

In this non-vectorized benchmark, the reduced resolution of the analog computer was ignored.
In fact it is slightly lower than an IEEE 754 half precision
floating-point, compared to the double precision floating-point numbers in the digital benchmark.
One can then assign the analog computer a *time-equivalent* floating-point operation performance

The analog computer FLOP-per-Joule ratio (note that $\mathrm{FLOP}/\mathrm{J}=\mathrm{FLOPs}/\mathrm{W}$) is

That is, the analog computer's “FLOP per Joule” is slightly larger than for the digital one. Furthermore, one can expect an increase of ${F}_{A}/{E}_{A}$ by 10–100 for an analog computer chip. See for instance Cowan (2005) and Cowan et al. (2005), who claim $\mathrm{20}\phantom{\rule{0.125em}{0ex}}\mathrm{GFlop}/\mathrm{s}$. We expect $\mathrm{300}\phantom{\rule{0.125em}{0ex}}\mathrm{GFlop}/\mathrm{s}$ to be more realistic, thought (Table 2).

Keep in mind that the FLOP/s or FLOP/J measures are (even in the case of comparing two digital computers)
always problem/algorithm-specific (*i.e.*, in this case a Runge Kutta solver of ${y}^{\prime \prime}=y$) and therefore
controversial as a comparative figure.

This section presents forecasts about the solution of large scale differential equations. No benchmarks
have been carried out, because a suitable integrated analog computer on chip does not yet exist.
For the estimates, an analog computer on chip with
an average energy consumption of about *P*_{N}=4 mW per computing element
(*i.e.*, per integration, multiplication, etc.)
and maximum frequency *ν*=100 Mhz, which is refered to as the *analog* maximum frequency
*ν*^{A} in the following, was assumed.was assumed.^{4}
These numbers are several orders of magnitude
better than the *P*_{N}=160 mW and *ν*=100 kHz of the *Analog Paradigm Model-1* computer discussed
in the previous section. For the digital part, different systems than before are considered.

In general, the bandwidth of an analog computer depends on the frequency response characteristics of the elements, such as summers and integrators. The actual achievable performance also depends on the technology. A number of examples shall be given to motivate our numbers: In 65 nm CMOS technology, bandwidths of over 2 GHz are achievable with integrators (Breems et al., 2016). At unity-gain frequencies of 800 MHz to 1.2 Ghz and power consumption of less than 2 mW, integrators with a unity-gain frequency of 400 Mhz are achievable (Wang et al., 2018).

## 4.1 Solving PDEs on digital and analog computers

Partial differential equations (PDEs) are among the most important and powerful mathematical frameworks
for describing dynamical systems in science and engineering.
PDE solutions are usually fields $\mathit{u}=\mathit{u}(\mathit{r},t)$,
*i.e.*, functions^{5} of spatial position ** r** and time

*t*. In the following, we concentrate on initial value boundary problems (IVBP). These problems are described by a set of PDEs valid within a spatial and temporal domain and complemented with field values imposed on the domain boundary. For a review of PDEs, their applications and solutions see for instance Brezis and Browder (1998). In this text, we use computational fluid dynamics (CFD) as a representative theory for discussing general PDE performance. In particular, classical hydrodynamics (Euler equation) in a flux-conservative formulation is described by hyperbolic conservation laws in the next sections. Such PDEs have a long tradition of being solved with highly accurate numerical schemes.

Many methods exist for the spatial discretization. While finite volume schemes are popular for their
conservative properties, finite difference schemes are in general cheaper to implement. In this
work, we stick to simple finite differences on a uniform grid
with some uniform grid spacing Δ** r**.
The evolution vector field

**(**

*u***,**

*r**t*) is sampled on

*G*grid points per dimension and thus replaced by

*u*_{k}(

*t*) with $\mathrm{0}\le k<G$. It is worthwhile to mention that this approach works in classical orthogonal “dimension by dimension” fashion, and the number of total grid points is given by

*G*

^{D}. The computational domain is thus bound by $\mathrm{\Omega}=[{\mathit{r}}_{\mathrm{0}},{\mathit{r}}_{G}{]}^{\mathrm{D}}$. A spatial derivative ∂

_{i}

*f*is then approximated by a central finite difference scheme, for instance ${\partial}_{i}{f}_{k}\approx ({f}_{k+\mathrm{1}}-{f}_{k-\mathrm{1}})/\left(\mathrm{2}\mathrm{\Delta}x\right)+\mathcal{O}\left(\mathrm{\Delta}{x}^{\mathrm{2}}\right)$ for a second order accurate central finite difference approximation of the derivative of some function

*f*at grid point

*k*.

Many algorithmic solvers implement numerical schemes which exploit the
*vertical method of lines* (MoL) to rewrite the PDE into coupled ordinary differential equations
(ODEs). Once applied, the ODE system can be written as ${\partial}_{\mathrm{t}}{u}^{k}={G}^{k}(\mathit{u},\mathbf{\nabla}\mathit{u})$
with *u*^{k} denoting the time evolved (spatial) degrees of freedom and *G*^{k} functions
containing spatial derivatives (∂_{i}*u*^{j}) and algebraic sources. A standard
time stepping method determines a solution *u*(*t*_{1}) at later time *t*_{1}>*t*_{0} by basically
integrating ${u}^{k}\left({t}_{\mathrm{1}}\right)={\int}_{{t}_{\mathrm{0}}}^{{t}_{\mathrm{1}}}{G}^{k}\left(\mathit{u}\right(t\left)\right)\mathrm{d}\phantom{\rule{0.125em}{0ex}}t+{u}^{k}\left({t}_{\mathrm{0}}\right)$.
Depending on the details of the scheme, *G*^{k} is evaluated (probably repeatedly or in a
weak-form integral approach) during the time integration of the system. However, note
that other integration techniques exist, such as the arbitrary high order
ADER technique (Titarev and Toro, 2002, 2005).
The particular spatial discretization method has a big impact on the computational cost
of *G*^{i}. Here, we focus on the (simplest) finite difference technique,
where the number of neighbor communications per dimension grows linearly with the
*convergence order* of the scheme.

## 4.2 Classical Hydrodynamics on analog computers

The broad class of *fluid dynamics* will be discussed
as popular yet simple type of PDEs.
It is well known for its efficient description
of the flow of liquids and gases in motion and is applicable in many domains such as
aerodynamics, in life sciences as well as fundamental sciences
(Sod, 1985; Chu, 1979; Wang et al., 2019). In this text, the simplest formulation is investigated: the
Newtonian hydrodynamics (also refered to as *Euler equations*) with an ideal gas
equation of state. It is given by a nonlinear
PDE describing the time evolution of a mass density *ρ*, it's
velocity *v*^{i}, momentum *p*^{i}=*ρ**v*^{i} and energy $e=t+\mathit{\epsilon}$, with the
kinetic contribution $t=\mathit{\rho}\phantom{\rule{0.25em}{0ex}}{\mathit{v}}^{\mathrm{2}}/\mathrm{2}$ and an “internal” energy *ε*, which
can account for forces on smaller length scales than the averaged scale.

Flux conservative *Newtonian hydrodynamics* with an ideal gas equation of state
are one of the most elementary and text-book level formulations of fluid dynamics
(Toro, 1998; Harten, 1997; Hirsch, 1990). The
PDE system can be written in a dimension agnostic way in *D* spatial dimensions
(*i.e.*, independent of the particular choice for *D*) as

with $i,j\in [\mathrm{1},D]$.
Here, the pressure $p=\mathit{\rho}\mathit{\epsilon}(\mathrm{\Gamma}-\mathrm{1})$ defines the ideal gas equation of state,
with adiabatic index Γ=2 and *δ*^{ij} is the Kronecker delta.
A number of vectors are important in the following: The integrated state
or *evolved* vector ** u** in contrast to the primitive state vector or

*auxiliary*quantities $\mathit{v}\left(u\right)=(p,{v}^{i})$, which is a collection of so called

*locally reconstructed*quantities. Furthermore, the right hand sides in Eq. (7) do not explicitly depend on the spatial derivative ∂

^{i}

*ρ*, thus the conserved flux vector $\mathit{f}=\mathit{f}(\mathrm{\nabla}\mathit{q},\mathit{v})$ is only a function of the derivatives of the

*communicated*quantities $\mathit{q}=(e,{p}^{i})$ and the auxiliaries

**. Furthermore,**

*v***and**

*q***are both functions of**

*v***only.**

*u*** S**=0 is a source term. Some hydrodynamical models can be coupled by purely choosing
some nonzero

**, such as the popular**

*S**Navier Stokes*equations which describe viscous fluids. Compressible Navier Stokes equations can be written with a source term $\mathit{S}=\mathbf{\nabla}\cdot {\mathit{F}}^{v}$, with

with specific heats *c*_{p}, *c*_{v}, viscosity coefficient *μ*,
Prandtl number Pr and temperature *T* determined by the perfect gas equation of state, *i.e.*, $T=(e-{\mathit{v}}^{\mathrm{2}})/\left(\mathrm{2}{c}_{v}\right)$.
The computational cost from Euler equation to Navier Stokes equation is roughly *doubled*.
Furthermore, the partial derivatives on the velocities and temperatures also *double*
the quantities which must be communicated with each neighbor in every dimension.
We use Euler equations in the following section for the sake of simplicity.

## 4.3 Spatial discretization: Trading interconnections vs. computing elements

Schemes of (convergence) order *F* shall be investigated, which require the communication
with *F* neighbour elements.
For instance, a *F*=4th order accurate stencil has to communicate and/or
compute four neighbouring elements ${\mathit{f}}_{k-\mathrm{2}},{\mathit{f}}_{k-\mathrm{1}},{\mathit{f}}_{k+\mathrm{1}},{\mathit{f}}_{k+\mathrm{2}}$.
Typically, long-term evolutions are carried out with *F*=4 or *F*=6.
In the following, for simplicity, second order stencil (*F*=2) is chosen.
One identifies three different subcircuits

with
${\mathit{f}}_{k\pm \mathrm{1}}:={\mathit{f}}_{k}({\mathit{q}}_{k\pm \mathrm{1}},{\mathit{v}}_{k})$
and
*v*_{k}:=*v*_{k}(*u*_{k})
according to their previous respective definitions.
Figure 1 shows this “building block” for a single grid point, an exemplar
for up to *D*=2 dimensions with an *F*=2nd order finite difference stencil. The circuit identifies a
number of intermediate expressions which are labeled as these equations:

Just like in Fig. 1, all expressions which are vanishing in a single spatial dimension are
colored in red. Furthermore, note how the index *i* denotes the *x*-direction and *k* the *y*-direction,
and that there are different fluxes *f*^{j} in the particular directions. Equation (13)
is closed with the element-local auxiliary recovery

Note that one can trade neighbor communication (*i.e.*, number of wires between grid points)
for local recomputation. For instance, it would be mathematically clean to communicate only the
conservation quantities ** u** and reconstruct

**whenever needed. In order to avoid too many recomputations, some numerical codes also communicate parts of**

*v***. In an analog circuit, it is even possible to communicate parts of the finite differences, such as the Δ**

*v*

*v*_{i,k}quantities in Eq. (13).

The number of analog computing elements required to solve the Euler equation
on a single grid point is determined as ${N}_{\mathrm{single}}=\mathrm{5}D+\mathrm{5}F(D+\mathrm{2})+\mathrm{9}$,
with *D* being the number of spatial dimensions
and *F* the convergence order (*i.e.*, basically the finite
difference stencil size).
Typical choices of interest are convergence orders of $F\in [\mathrm{2},\mathrm{6}]$ in $D\in [\mathrm{1},\mathrm{3}]$ spatial dimensions.
Inserting the averaged $F=\mathrm{3}\pm \mathrm{1}$ and $D=\mathrm{2}\pm \mathrm{1}$ into *N*_{single} yields an averaged
${N}_{\mathrm{single}}\approx (\mathrm{84}\pm \mathrm{40})$ computing elements per spatial degree of freedom
(grid point) required for implementing Euler equations.

Unfortunately, this circuit is too big to fit on the *Analog Paradigm Model-1* computer resources
available. Consequently the following discussion is based on a future implementation using a large number of
interconnected analog chips. It is noteworthy that this level of integration is necessary to implement
large scale analog computing applications. With *P*_{N}=4 mW per computing element,
the average power per spatial degree of freedom (*i.e.*, single grid point) is ${P}_{N\mathrm{D}}=(\mathrm{336}\pm \mathrm{160})$ mW.

## 4.4 Time to solution

Numerical PDE solvers are typically benchmarked using a *wall-clock time per degree of freedom update* measure *T*_{DOF}, where *element update* typically means a time integration timestep.
In this measure, the overall wall clock time is normalized (divided) by the number of spatial
degrees of freedom as well as the number of parallel processors involved.

The fastest digital integrators found in literature carry out a time per degree of freedom
update ${T}_{\mathrm{DOF}}={\mathrm{10}}^{\mathrm{1}\pm \mathrm{1}}$ µs. Values smaller
than 1 µs require already the use of sophisticated communication avoiding
numerical schemes such as discontinuous Galerkin (DG) schemes.^{6} For instance,
Dumbser et al. (2008) demonstrate the superiority of so called *P*_{N}*P*_{M}
methods (polynomial of degree *N* for reconstruction and *M* for
time integration, where the limit *P*_{0}*P*_{M} denotes a standard high-order finite volume scheme)
by reporting *T*_{DOF}=0.8 µs for a *P*_{2}*P*_{2} method when solving two-dimensional Euler
equations. Diot et al. (2013) report an adaptive scheme which performs no faster than
*T*_{EU}=30 µs when applied to three-dimensional Euler equations.
The predictor-corrector arbitrary-order ADER scheme applied by Köppel (2018) and Fambri et al. (2018)
to the *general-relativistic magnetodynamic* extension of hydrodynamics reported *T*_{DOF}=41 µs
as the fastest speed obtained. The non-parallelizable evaluation of more complex hydrodynamic models is clearly
reflected in the increasing times *T*_{DOF}.

Recalling the benchmark result of *T*_{DOF}∼45 ns from Sect. 3.1,
the factor of 1000 is mainly caused by the inevitable communication required for obtaining
neighbor values when solving *f*(*y*,∇*y*) in ${\partial}_{\mathrm{t}}y=f\left(y\right)$. Switched networks have an
intrinsic communication latency and one cannot expect *T*_{DOF} to shrink significantly, even for
newer generations of supercomputers. A key advantage of analog computing is that grid neighbor communication
happens continuously in the same time as in the grid-local circuit. That is, no time is lost for communication.

One can do a comparison with the analog computer without knowing the simulation time step size Δ*t*.
The reasoning is based on the maximum frequency, *i.e.*, the shortest wavelength which can be resolved with
a (first order in time^{7}) numerical scheme is ${f}_{\mathrm{sim}}:=\mathrm{1}/\left(\mathrm{10}\mathrm{\Delta}t\right)$, *c.f.*, Fig. 2.
The factor $\mathrm{10}=\mathrm{2}\cdot \mathrm{5}$ includes a factor of
2 due to the Nyquist-Shannon sampling theorem, while the factor of 5 is chosen to take into account that
a numerical scheme can marginally reconstruct a wave at frequency $f=\mathrm{1}/\left(\mathrm{2}\mathrm{\Delta}t\right)$ by two points while it
can be obtained perfectly by the analog computer (down to machine precision without any artifacts). The
integration of signals beyond the maximum frequency results in a nonlinear response which heavily depends on the
electrical details of the circuit (which are beyond the scope of the analog computer architecture discussed in
this paper). One can demand that the numerical integrator time resolution is good enough to reconstruct a signal
*without* prior knowledge on the wave form even at the maximum frequency.^{8} This drives the demand for 5 additional
sampling points per half-wave, in order to make analog and digital outcome comparable (see also Fig. 2).

It is noted that this argument is relevant as long as one is interested in obtaining and preserving the *correct* time
evolution (of a system described by the differential equation) with an analog or digital computer, respectively.
In general, it is not valid to reduce the computational correctness within the solution domain of an initial value problem as
this will invalidate any later solution.

By assigning the numerical PDE solver a maximum frequency identical to the highest frequency which can be
evolved by the scheme in a given time, one introduces an *effective digital computer maximum frequency*

Note how the mapping of simulation time (interval) Δ*t* to
wall-clock time (interval) *T*_{DOF} results in a mapping of
simulation frequency *f*_{sim} to wall-clock (or real-time) frequency
*ν*^{D} (Fig. 2).

The calculated ${\mathit{\nu}}^{\mathrm{D}}={\mathrm{10}}^{-\mathrm{2}\pm \mathrm{1}}$ MHz has to be contrasted with *ν*^{A}=100 MHz of the analog computer chip. One can conclude that
analog computers can solve large scale high performance computing at least ${\mathit{\nu}}^{A}/{\mathit{\nu}}^{\mathrm{D}}={\mathrm{10}}^{\mathrm{3}\pm \mathrm{1}}$ times faster
than the digital ones, when *T*_{A} and *T*_{D} are the analog and digital time to solution.
Since $T\sim \mathrm{1}/\mathit{\nu}$, the resolution time reduces accordingly and ${T}_{A}/{T}_{\mathrm{D}}={\mathrm{10}}^{-\mathrm{3}\pm \mathrm{1}}$.

This is a remarkable result as it already assumes the fastest numerical integration schemes on a
*perfectly* scaling parallel digital computer. In practical problems, these assumptions are hardly
ever met: The impossibility of (ideal) parallelization is one of the major drawbacks of digital computing. Nevertheless,
the above results show that even without these drawbacks, the analog computer is orders of magnitude faster.
Notably, while it needs careful adjustment both the problem and the code for a high-performance computer to achieve
acceptable parallel performance, when using an analog computer these advantages come effortless. The only
way to reduce the speed or timing advantage is to choose a disadvantegeous or unsuitable number scaling.

In this study the low resolution of an analog computer
(which is effectively IEEE 754 half precision floating-point) has been neglected.
In fact, high order time integration schemes
can invest computing time in order to achieve machine level accuracy which a typical error
$\mathrm{\Delta}{f}_{\mathrm{digital}}\sim {\mathrm{10}}^{-\mathrm{10}}$
on some evolved function or field *f* and an error definition
$\mathrm{\Delta}{f}_{\mathrm{simulation}}:=({f}_{\mathrm{simulation}}-{f}_{\mathrm{exact}})/{f}_{\mathrm{exact}}$.
An analog computer is limited by its intrinsic accuracy with
a typical error $\mathrm{\Delta}{f}_{\mathrm{analog}}\sim {\mathrm{10}}^{-(\mathrm{4}\pm \mathrm{1})}$
(averaging over the *Analog Paradigm Model-1* and future analog computers on chip).

## 4.5 Energy and power consumption

One expects the enormous speedup ${T}_{A}/{T}_{\mathrm{D}}$ of the analog computer to result in a much lower energy
budget ${E}_{\mathrm{D}}=({T}_{\mathrm{D}}/{T}_{A}){E}_{A}={\mathrm{10}}^{\mathrm{3}\pm \mathrm{1}}{E}_{A}$ for a given problem. However, as the power requirement
is proportional to the analog computer size, *P*_{A}=*N**P*_{ND}, the problem size (number of grid points)
which can be handled by
the analog computer is limited by the overall power consumption.
For instance, with a typical high performance computer power consumption of *P*_{A}=20 MW,
one can simultaneously evolve a grid with $N={P}_{A}/{P}_{N\mathrm{D}}={\mathrm{10}}^{\mathrm{11}\pm \mathrm{0.5}}$ points. This is in the same order of magnitude
as the largest scale computational fluid dynamics simulations evolved on digital high performance computer
clusters (*c.f.*, Green 500 list, Subramaniam et al., 2013, 2020). Note that in such a setup, the solution
is obtained on average 10^{3±1} times faster with a purely analog computer and consequently also the
energy demand is 10^{3±1} times lower.

Just to depict an analog computer of this size: Given 1000 computing elements per chip, 1000 chips per rack unit, 40 units per rack still requires 2500 racks to build such a computer in a traditional design. This is one order of magnitude larger than the size of typical high performance centers. Clearly, at such a size the interconnections will also have a considerable power consumption, even if the monumental engineering challenges for such a large scale interconnections can be met. On a logical level, interconnections are mostly wires and switches (which require little power, compared to computing elements). This can change dramatically with level converters and an energy estimate is beyond the scope of this work.

## 4.6 Hybrid techniques for trading power vs. time

The analog computers envisaged so far have to grow with problem size (*i.e.*, with grid size, but also with equation complexity).
Modern chip technology could make it
theoretically possible to build a computer with 10^{12} analog computing elements, which is many orders of
magnitude larger than any analog computer that has been built so far (about 10^{3} computing elements
at maximum). The idea of combining
an analog and a digital computer thus forming a *hybrid computer*
featuring analog and digital computing elements is not new.
With the digital memory and algorithmically controlled program flow, a small analog computer
can be used repeatedly on a larger problem under control of the digital computer it is mated to.
Many attempts at solving
PDEs on hybrid computers utilized the analog computer for computing the
element-local updated state with the digital computer looping over the spatial
degrees of freedom. In such a scheme, the analog computer fulfils the role
of an *accelerator* or *co-processor*. Such attempts are subject of various
historical (such as Nomura and Deiters, 1968; Reihing, 1959; Vichnevetsky, 1968, 1971; Volynskii and Bukham, 1965; Bishop and Green, 1970; Karplus and Russell, 1971; Feilmeier, 1974)
and contemporary studies (for instance Amant et al., 2014; Huang et al., 2017).

A simple back-of-the-envelope estimation with a modern hybrid computer tackling the
*N*=10^{11} problem is described below. The aim is to trade the sheer number of computing elements with their
electrical power *P*, respectively, against solution time *T*.
It is assumed that the analog-digital hybrid scheme works similarly to numerical parallelization:
The simulation domain with *N* degrees of freedom is divided into *Q* parts which can be evolved independently to a certain
degree (for instance in a predictor-corrector scheme). This allows to use a smaller analog
computer which only needs to evolve $N/Q$ degrees of freedom at a time. While the power consumption of such a
computer is reduced to ${P}_{A}\to {P}_{A}/Q$, the time to solution increases to *T*_{A}→*Q**T*_{A}. Of course, the
overall required energy remains the same, ${E}_{A}={P}_{A}{T}_{A}=({P}_{A}/Q)\left(Q{T}_{A}\right)$.

In this simple model, energy consumption of the digital part in the hybrid computer as well as numerical details
of the analog-digital hybrid computer scheme have been neglected. This includes the time-to-solution overhead introduced
by the numerical scheme implemented by the digital computer (negligible for reasonably small *Q*) and the power demands of
the ADC/DAC (analog-to-digital/digital-to-analog) converters (an overhead which scales with $(D+\mathrm{2}){G}^{\mathrm{D}}/Q$, *i.e.*, the state vector
size per grid element).

Given a fixed four orders of magnitude speed difference ${\mathit{\nu}}^{\mathrm{D}}/{\mathit{\nu}}^{A}={\mathrm{10}}^{\mathrm{4}}$ and a given physical problem
with grid size *N*=10^{11}, one can build an analog-digital hybrid computer which requires less power and is
reasonably small so that the overall computation is basically still done in the analog domain and digital effects will not dominate.
For instance, with *Q* chosen just as big as $Q={\mathit{\nu}}^{\mathrm{D}}/{\mathit{\nu}}^{A}$, the analog computer
would evolve only $N/Q={\mathrm{10}}^{\mathrm{7}}$ points in time, but run 10^{4} times “in repetition”. The required power
reduces from cluster-grade to desktop-grade ${P}_{A}=(N/Q){P}_{N\mathrm{D}}=\mathrm{3.3}$ kW.
The runtime advantage is of course lost, ${T}_{\mathrm{D}}/{T}_{A}=\left(Q{\mathit{\nu}}^{A}\right)/{\mathit{\nu}}^{\mathrm{D}}=\mathrm{1}$.

Naturally, this scenario can also be applied to solve larger problems with a given grid size. For instance,
given an analog computer with the size of *N*=10^{11} grid points, one can solve a grid of size *Q**N* by
succesively evolving *Q* parts of the computer with the same power *P*_{A} as for a grid of size *N*. Of course, the
overall time to solution and energy will grow with *Q*. In any case, time and energy remain (3±1) orders of
magnitude lower than for a purely digital computer solution.

In Sect. 2, we have shown the time and power needs of analog computers
are *orthogonal* to those of digital computers. In Sect. 3, we performed
an actual miniature benchmark of a commercially available
*Analog Paradigm Model-1* computer versus a mobile Intel© processor.
The results are remarkable in several ways: The modern analog computer *Analog Paradigm Model-1*,
uses integrated circuit technology which is comparable to the 1970s digital integration level.
Nevertheless it achieves competitive results in computational power and energy consumption compared
to a mature cutting-edge digital processor architecture which has been developed by one of the largest
companies in the world.
We also computed a problem-dependent effective FLOP/s value for
the analog computer. For the key performance measure for energy-efficient computing,
namely *FLOP-per-Joule*, the analog computer again obtains remarkable results.

Note that while FLOP/s is a popular measure in scientific computing, it is always application- and
algorithm-specific. Other measures exist, such as transversed edges per second
(TEPS) or synaptic updates per second (SUPS). Cockburn and Shu (2001) propose for instance to measure
the *efficiency* of a PDE solving method by computing the inverse of the
product of the (spatial-volume integrated) *L*^{1}-error times the computational cost in
terms of time-to-solution or invested resources.

In Sect. 4, large scale applications were discussed on the example of fluid
dynamics and by comparing high performance computing results with a prospected analog
computer chip architecture. Large scale analog applications can become power-bound and
thus require the adoption of analog-digital hybrid architectures. Nevertheless, with their 𝒪(1)
runtime scaling, analog computers excel for time integrating large coupled systems where
algorithmic approaches suffer from communication costs. We predict outstanding advantages in
terms of time-to-solution when it comes to large scale analog computing. Given the advent
of chip-level analog computing, a *gigascale* analog computer (a device with ∼10^{9}
computing elements) could become a *game changer* in this decade. Of course, major obstacles
have to be addressed to realize such a computer, such as the interconnection toplogy and
realization in an (energy) efficient manner.

Furthermore, there are a number of different approaches in the field of partial differential
equations which might be even better suited to analog computing. For instance, solving
PDEs with artificial intelligence has become a fruitful research field in the last
decade
(see for instance Michoski et al., 2020; Schenck and Fox, 2018), and analog neural networks
might be an interesting candidate to challenge digital approaches.
Number representation on analog computers can be
nontrivial when the dynamical range is large. This is frequently the case with fluid dynamics,
where large density fluctiations are one reason why perturbative solutions fail and numerical
simulations are carried out in the first place. One reason why
*indirect* alternative approaches such as neural networks could be better suited than
*direct* analog computing networks is that this problem is avoided.
Furthermore, the demand for high accuracy in fluid dynamics can not easily fulfilled by low resolution
analog computing. In the end, it is quite possible that
a small-sized analog neural network might outperform a large-sized classical pseudo-linear time evolution
in terms of time-to-solution and energy requirements. Most of these engineering challenges
have not been discussed in this work and are subject to future studies.

The software code is available in the Supplement.

No data sets were used in this article.

The supplement related to this article is available online at: https://doi.org/10.5194/ars-19-105-2021-supplement.

BU performed the analog simulations. SK carried out the numerical simulations and the estimates. DK implements the concept for an integrated analog computer. All authors contributed to the article.

The authors declare that they have no conflict of interest.

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the special issue “Kleinheubacher Berichte 2020”.

We thank our anonymous referees for helpful comments and corrections. We further thank Chris Giles for many corrections and suggestions which improved the text considerably.

This paper was edited by Madhu Chandra and reviewed by two anonymous referees.

Amant, R., Yazdanbakhsh, A., Park, J., Thwaites, B., Esmaeilzadeh, H., Hassibi, A., Ceze, L., and Burger, D.: General-purpose code acceleration with limited-precision analog computation, in: ACM/IEEE 41st International Symposium on Computer Architecture (ISCA), vol. 42, pp. 505–516, https://doi.org/10.1109/ISCA.2014.6853213, 2014. a

Bishop, K. and Green, D.: Hybrid Computer Impelementation of the Alternating Direction Implicit Procedure for the Solution of Two-Dimensional, Parabolic, Partial-Differential Equations, AIChE Journal, 16, 139–143, https://doi.org/10.1002/aic.690160126, 1970. a

Bournez, O. and Pouly, A.: A Survey on Analog Models of Computation, in: Handbook of Computability and Complexity, Springer, Cham, pp. 173–226, https://doi.org/10.1007/978-3-030-59234-9_6, 2021. a

Breems, L., Bolatkale, M., Brekelmans, H., Bajoria, S., Niehof, J., Rutten, R., Oude-Essink, B., Fritschij, F., Singh, J., and Lassche, G.: A 2.2 GHz Continuous-Time Delta Sigma ADC With −102 dBc THD and 25 MHz Bandwidth, IEEE J. Solid-St. Circ., 51, 2906–2916, https://doi.org/10.1109/jssc.2016.2591826, 2016. a

Brezis, H. and Browder, F.: Partial Differential Equations in the 20th Century, Adv. Math., 135, 76–144, https://doi.org/10.1006/aima.1997.1713, 1998. a

Calude, C. S., Pa˘un, G., and Ta˘ta˘râm, M.: A Glimpse into natural computing. Centre for Discrete Mathematics and Theoretical Computer Science, The University of Auckland, New Zealand, available at: https://www.cs.auckland.ac.nz/research/groups/CDMTCS/researchreports/download.php?selected-id=93 (last access: 2 August 2021), 1999. a

Chu, C.: Numerical Methods in Fluid Dynamics, Adv. Appl. Mech., 18, 285–331, https://doi.org/10.1016/S0065-2156(08)70269-2, 1979. a

Cockburn, B. and Shu, C.-W.: Runge–Kutta Discontinuous Galerkin Methods for Convection-Dominated Problems, J. Sci. Comput., 16, 173–261, https://doi.org/10.1023/A:1012873910884, 2001. a, b

Cowan, G., Melville, R. C., and Tsividis, Y. P.: A VLSI analog computer/math co-processor for a digital computer, ISSCC Dig. Tech. Pap. I, San Francisco, CA, USA, 10–10 February 2005, vol. 1, pp. 82–586, https://doi.org/10.1109/ISSCC.2005.1493879, 2005. a

Cowan, G. E. R.: A VLSI analog computer/math co-processor for a digital computer, PhD thesis, Columbia University, available at: http://www.cisl.columbia.edu/grads/gcowan/vlsianalog.pdf (last access: 2 August 2021), 2005. a

Dahlquist, G. and Jeltsch, R.: Generalized disks of contractivity for explicit and implicit Runge-Kutta methods, Royal Institute of Technology, Stockholm, Sweden, available at: https://www.sam.math.ethz.ch/sam_reports/reports_final/reports2008/2008-20.pdf (last access: 2 August 2021), 1979. a

Deaton, R., Garzon, M., Rose, J., Franceschetti, D., and Stevens, S.: DNA Computing: A Review, Fund. Inform., 35, 231–245, https://doi.org/10.3233/FI-1998-35123413, 1998. a

de Melo, A. C.: The New Linux 'perf' Tools, Tech. rep., available at: http://vger.kernel.org/~acme/perf/lk2010-perf-paper.pdf (last access: 27 July 2021), 2010. a

Diot, S., Loubère, R., and Clain, S.: The MOOD method in the three-dimensional case: Very-High-Order Finite Volume Method for Hyperbolic Systems, Int. J. Numer. Meth. Fl., 73, 362–392 https://doi.org/10.1002/fld.3804, 2013. a

Dumbser, M., Balsara, D. S., Toro, E. F., and Munz, C.-D.: A unified framework for the construction of one-step finite volume and discontinuous Galerkin schemes on unstructured meshes, J. Comput. Phys., 227, 8209–8253, https://doi.org/10.1016/j.jcp.2008.05.025, 2008. a

Fambri, F., Dumbser, M., Köppel, S., Rezzolla, L., and Zanotti, O.: ADER discontinuous Galerkin schemes for general-relativistic ideal magnetohydrodynamics, Mon. Not. R. Astron. Soc., 477, 4543–4564, https://doi.org/10.1093/mnras/sty734, 2018. a

Feilmeier, M.: Hybridrechnen, Springer, Basel, https://doi.org/10.1007/978-3-0348-5490-0, 1974. a

Georgescu, I. M., Ashhab, S., and Nori, F.: Quantum simulation, Rev. Mod. Phys., 86, 153–185, https://doi.org/10.1103/revmodphys.86.153, 2014. a

Gruber, T., Eitzinger, J., Hager, G., and Wellein, G.: LIKWID 5: Lightweight Performance Tools, Zenodo [data set], https://doi.org/10.5281/zenodo.4275676, 2020. a

Gustafson, J. L.: Reevaluating Amdahl′s law, Commun. ACM, 31, 532–533, https://doi.org/10.1145/42411.42415, 1988. a

Hager, G., Wellein, G., and Treibig, J.: LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments, in: 2010 39th International Conference on Parallel Processing Workshops, IEEE Computer Society, Los Alamitos, CA, USA, 13–16 September 2010, pp. 207–216, https://doi.org/10.1109/ICPPW.2010.38, 2010. a

Harten, A.: High resolution schemes for hyperbolic conservation laws, J. Computat. Phys., 135, 260–278, https://doi.org/10.1006/jcph.1997.5713, 1997. a

Hirsch, C.: Numerical computation of internal and external flows, in: Computational Methods for Inviscid and Viscous Flows, vol. 2, John Wiley & Sons, Chichester, England and New York, 1990. a

Huang, Y., Guo, N., Seok, M., Tsividis, Y., Mandli, K., and Sethumadhavan, S.: Hybrid analog-digital solution of nonlinear partial differential equations, in: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, ACM, 665–678, https://doi.org/10.1145/3123939.3124550, 2017. a

Karplus, W. and Russell, R.: Increasing Digital Computer Efficiency with the Aid of Error-Correcting Analog Subroutines, IEEE T. Comput., C-20, 831–837, https://doi.org/10.1109/T-C.1971.223357 1971. a

Kendon, V. M., Nemoto, K., and Munro, W. J.: Quantum analogue computing, Philos. T. R. Soc. A, 368, 3609–3620, https://doi.org/10.1098/rsta.2010.0017, 2010. a

Köppel, S.: Towards an exascale code for GRMHD on dynamical spacetimes, J. Phys. Conf. Ser., 1031, 012017, https://doi.org/10.1088/1742-6596/1031/1/012017, 2018. a

MacLennan, B. J.: Natural computation and non-Turing models of computation, Theor. Comput. Sci., 317, 115–145, https://doi.org/10.1016/j.tcs.2003.12.008, 2004. a

MacLennan, B. J.: Analog Computation, in: Computational Complexity, edited by: Meyers, R., Springer, New York, pp. 161–184, https://doi.org/10.1007/978-1-4614-1800-9_12, 2012. a

MacLennan, B. J.: Unconventional Computing, University of Tennessee, Knoxville, Tennessee, USA, available at: http://web.eecs.utk.edu/~bmaclenn/Classes/494-594-UC/handouts/UC.pdf (last access: 27 July 2021), 2019. a

Michoski, C., Milosavljević, M., Oliver, T., and Hatch, D. R.: Solving differential equations using deep neural networks, Neurocomputing, 399, 193–212, https://doi.org/10.1016/j.neucom.2020.02.015, 2020. a

Nomura, T. and Deiters, R.: Improving the analog simulation of partial differential equations by hybrid computation, Simulation, 11, 73–80, https://doi.org/10.1177/003754976801100207, 1968. a

Reihing, J.: A time-sharing analog computer, in: Proceedings of the western joint computer conference, San Francisco, CA, USA, 3–5 March 1959, 341–349, https://doi.org/10.1145/1457838.1457904, 1959. a

Rodgers, D. P.: Improvements in Multiprocessor System Design, SIGARCH Comput. Archit. News, 13, 225–231, https://doi.org/10.1145/327070.327215, 1985. a, b

Röhl, T., Eitzinger, J., Hager, G., and Wellein, G.: LIKWID Monitoring Stack: A Flexible Framework Enabling Job Specific Performance monitoring for the masses, 2017 IEEE International Conference on Cluster Computing (CLUSTER), Honolulu, HI, USA, 5–8 September 2017, 781–784, https://doi.org/10.1109/CLUSTER.2017.115, 2017. a

Schenck, C. and Fox, D.: Spnets: Differentiable fluid dynamics for deep neural networks, in: Conference on Robot Learning, PMLR, 87, 317–335, available at: http://proceedings.mlr.press/v87/schenck18a.html (last access: 27 July 2021), 2018. a

Schuman, C. D., Potok, T. E., Patton, R. M., Birdwell, J. D., Dean, M. E., Rose, G. S., and Plank, J. S.: A Survey of Neuromorphic Computing and Neural Networks in Hardware, arXiv [preprint], arXiv:1705.06963v1, 19 May 2017. a

Shu, C.-W.: High order WENO and DG methods for time-dependent convection-dominated PDEs: A brief survey of several recent developments, J. Comput. Phys., 316, 598–613, https://doi.org/10.1016/j.jcp.2016.04.030, 2016. a

Siegelmann, H. T.: Computation Beyond the Turing Limit, Science, 268, 545–548, https://doi.org/10.1126/science.268.5210.545, 1995. a

Sod, G.: Numerical Methods in Fluid Dynamics: Initial and Initial Boundary-Value Problems, Cambridge University Press, Cambridge, UK, 1985. a

Subramaniam, B., Saunders, W., Scogland, T., and Feng., W.-c.: Trends in Energy-Efficient Computing: A Perspective from the Green500, in: Proceedings of the International Green Computing Conference, Arlington, VA, USA, 27–29 June 2013, https://doi.org/10.1109/IGCC.2013.6604520, 2013. a

Subramaniam, B., Scogland, T., Feng, W.-c., Cameron, K. W., and Lin, H.: Green 500 List, 2020, available at: http://www.green500.org (last access: 28 July 2021), 2020. a

Titarev, V. A. and Toro, E. F.: ADER: Arbitrary High Order Godunov Approach, J. Sci. Comput., 17, 609–618, https://doi.org/10.1023/A:1015126814947, 2002. a

Titarev, V. A. and Toro, E. F.: ADER schemes for three-dimensional non-linear hyperbolic systems, J. Comput. Phys., 204, 715–736, https://doi.org/10.1016/j.jcp.2004.10.028, 2005. a

Toro, E. F.: Primitive, Conservative and Adaptive Schemes for Hyperbolic Conservation Laws, in: Numerical Methods for Wave Propagation. Fluid Mechanics and Its Applications, edited by: Toro, E. F. and Clarke, J. F., vol. 47, Springer, Dordrecht, the Netherlands, pp. 323–385, https://doi.org/10.1007/978-94-015-9137-9_14, 1998. a

Ulmann, B.: Model-1 Analog Computer Handbook/User Manual, available at: http://analogparadigm.com/downloads/handbook.pdf (last access: 28 July 2021), 2019. a, b

Ulmann, B.: Analog and Hybrid Computer Programming, De Gruyter Oldenbourg, Berlin, Boston, https://doi.org/10.1515/9783110662207, 2020. a, b

Vichnevetsky, R.: A new stable computing method for the serial hybrid computer integration of partial differential equations, in: Spring Joint Computer Conference, Atlantic City New Jersey, 30 April–2 May 1968, 143–150, https://doi.org/10.1145/1468075.1468098, 1968. a

Vichnevetsky, R.: Hybrid methods for partial differential equations, Simulation, 16, 169–180, 1971. a

Volynskii, B. A. and Bukham, V. Y.: Analogues for the Solution of Boundary-Value Problems, 1st edn., in: International Tracts in Computer Science and Technology and Their Application, Oxford, London, 1965. a

Wang, W., Zhu, Y., Chan, C.-H., and Martins, R. P.: A 5.35-mW 10-MHz Single-Opamp Third-Order CT Delta Sigma Modulator With CTC Amplifier and Adaptive Latch DAC Driver in 65-nm CMOS, IEEE J. Solid-St. Circ., 53, 2783–2794, https://doi.org/10.1109/jssc.2018.2852326, 2018. a

Wang, Y., Yu, B., Berto, F., Cai, W., and Bao, K.: Modern numerical methods and theirapplications in mechanical engineering, Adv. Mech. Eng., 11, 1–3, https://doi.org/10.1177/1687814019887255, 2019. a

Wilhelm, F., Steinwandt, R., Langenberg, B., Liebermann, P., Messinger, A., Schuhmacher, P., and Misra-Spieldenner, A.: Status of quantum computer development, Version 1.2, BSI Project Number 283, Federal Office for Information Security, available at: https://www.bsi.bund.de/SharedDocs/Downloads/DE/BSI/Publikationen/Studien/Quantencomputer/P283_QC_Studie-V_1_2.html (last access: 28 July 2021), 2020. a

Zhou, Y., Stoudenmire, E. M., and Waintal, X.: What Limits the Simulation of Quantum Computers?, Phys. Rev. X, 10, 041038, https://doi.org/10.1103/physrevx.10.041038, 2020. a

Ziegler, M.: Novel hardware and concepts for unconventional computing, Sci. Rep., 10, 11843, https://doi.org/10.1038/s41598-020-68834-1, 2020. a

^{1}

This equation is inspired by the Dahlquist and Jeltsch (1979) *test equation* ${y}^{\prime}=\mathit{\lambda}y$
used for stability studies. The advantage of using an oscillator is the self-similarity of the
solution which can be observed over a long time.

^{2}

Both in terms of *dense output* or
any kind of evolution tracking. A textbook-level approach with minimal memory footprint is adopted
which could be considered an *in-place* algorithm.

^{3}

sic! We either argue with *overall* FLOP and Energy (Joule) or *per second* quantities such as
FLOP/s (in short FLOPS) and Power (Watt). In order to avoid confusion, we avoid the abbreviation
“FLOPS” in the main text.
Furthermore, SI prefixes are used, *i.e.*, kFLOP=10^{3} FLOP,
MFLOP=10^{6} FLOP and GFLOP=10^{9} FLOP.

^{4}

Summation will be done implicitly on chip by making use of Kirchhoff's law (current summing) so that no explizit computing element are required for this operation.

^{5}

The explicit dependency on ** r** and

*t*is omitted in the following text.

^{6}

*h*−*p* methods, which provide both
mesh refinement in grid spacing *h* as well as a “local” high order description typically
in some function base expansion of order *p*. For reviews, see for instance
Cockburn and Shu (2001) or Shu (2016).

^{7}

For a high order time integration scheme, the cutoff increases formally linearly as ${f}_{\mathrm{0}}\sim p/\left(\mathrm{10}{T}_{\mathrm{DOF}}\right)$. That is, for a fourth order scheme, the digital computer is effectively four times faster in this comparison.

^{8}

Note that on a digital computer, the maximum frequency is identical to a *cutoff frequency*
(also refered to as *ultraviolet cutoff*). On analog computers, there is no such *hard*
cutoff as computing elements tend to be able to compute with decreased quality at higher frequencies.

- Abstract
- Introduction
- A Simple (Linear) Model for Comparing Analog and Digital Performance
- A performance survey on solving ordinary differential equations (ODEs)
- PDEs and many degrees of freedom
- Summary and outlook
- Code availability
- Data availability
- Author contributions
- Competing interests
- Disclaimer
- Special issue statement
- Acknowledgements
- Review statement
- References
- Supplement

- Abstract
- Introduction
- A Simple (Linear) Model for Comparing Analog and Digital Performance
- A performance survey on solving ordinary differential equations (ODEs)
- PDEs and many degrees of freedom
- Summary and outlook
- Code availability
- Data availability
- Author contributions
- Competing interests
- Disclaimer
- Special issue statement
- Acknowledgements
- Review statement
- References
- Supplement