Using analog computers in today's largest computational challenges
Bernd Ulmann
Lars Heimann
Dirk Killat
Analog computers can be revived as a feasible technology platform for low precision, energy efficient and fast computing. We justify this statement by measuring the performance of a modern analog computer and comparing it with that of traditional digital processors. General statements are made about the solution of ordinary and partial differential equations. Computational fluid dynamics are discussed as an example of large scale scientific computing applications. Several models are proposed which demonstrate the benefits of analog and digitalanalog hybrid computing.
 Article
(738 KB)  Fulltext XML

Supplement
(58 KB)  BibTeX
 EndNote
Digital computing has transformed many — if not close to all — aspects of industry, humanities and science. Turing completeness allows statements to be made about the computability and decidability of problems and computational power of machines. Digital storage has undergone numerous technological advances and is available in increasingly vast amounts. Nevertheless, contemporary digital computing is possibly not the last word in computing, despite its dominance in the consumer market for the last 40+ years.
Fundamental research about nontraditional (also referred to as unconventional or exotic) computing is taking place in material sciences, chemistry but also in more exotic branches such as biology and life sciences. Amongst others, beyondTuring computing (Siegelmann, 1995), natural computing (Calude et al., 1999), neuromorphic computing (Schuman et al., 2017; Ziegler, 2020) or quantum computing (Zhou et al., 2020; Georgescu et al., 2014; Kendon et al., 2010) are fields of active investigation. Being fundamental research at heart, these disciplines come with technological challenges. For instance, computing with DNA still requires the use of large scale laboratory equipment and machinery (Deaton et al., 1998). Currently, not only the lowtemperature laboratory conditions but also the necessary error correction schemes challenge practical quantum computers (Wilhelm et al., 2017). This currently negates any practical advantage over silicon based digital computing. Furthermore, all of these alternative (or exotic) computer architectures share the characteristic that they are fundamentally nonportable. This means they will have to be located at large facilities and dedicated specialpurpose computing centers for a long time, if not forever. This is not necessarily a practical drawback, since the internet allows for delocalization of systems.
In contrast to this, silicon based electronic analog computing is a technology with a rich history, which operates in a normal workplace environment (nonlaboratory conditions; Ulmann, 2020). Digital computers overtook their analog counterparts in the last century, primarily due to their everincreasing digital clock speeds and their flexibility that comes from their algorithmic approach and the possibility of using these machines in a timeshared environment. However, today Moore's law is coming to a hard stop and processor clock speeds have not significantly increased in the past decade. Manycore architectures and vectorization come with their own share of problems, given their fundamental limits as described, for instance, by Amdahl's law (Rodgers, 1985). GPGPUs and specialized digital computing chips concentrate on vectorized, and even data floworiented programming paradigms but are still limited by parasitic capacitances which determine the maximum possible clock frequency and provide a noticeable energy barrier.
Thanks to their properties, analog computers have attracted the interest of many research groups. For surveys of theory and applications, see for instance Bournez and Pouly (2021) or the works of MacLennan (2004, 2012, 2019). In this paper, we study the usability of analog computers for applications in science. The fundamental properties of analog computers are low power requirements, low resolution computation and intrinsic parallelism. Two very different uses cases/scenarios can be identified: High performance computing (HPC) and low energy portable computing. The energy and computational demands for both scenarios are diametricallyopposed and this paper is primarily focused on HPC.
The paper is structured as follows: In Sect. 2, we review the general assumptions about digital and analog computing. In Sect. 3, small scale benchmark results are presented for a simple ordinary differential equation. In Sect. 4, a typical partial differential equation is considered as an example for a large scale problem. Spatial discretization effects and computer architecture design choices are discussed. Finally, Sect. 5 summarizes the findings.
In this paper, we study different techniques for solving differential equations computationally. Due to the different conventions in algorithmic and analog approaches, a common language had to be found and is described in this section. Here, the term algorithmic approach addresses the classical Euler method or classical quasilinear techniques in ordinary or partial differential equations (ODEs/PDEs), i.e., general methods of numerical mathematics. The term analog approach addresses the continuous time integration with an operational amplifier having a capacitor in the feedback loop. The fundamental measures of computer performance under consideration are the timetosolution T, the power consumption P and the energy demand E.
2.1 Time to solution
The timetosolution T is the elapsed real time (lab time or wall clock time) for solving a differential equation ${\partial}_{\mathrm{t}}u=f\left(u\right)$ from its initial condition u(t_{0}) at time t_{0} to some target simulation time t_{final}, i.e., for obtaining u(t_{final}). The speed factor ${k}_{\mathrm{0}}:=T/{t}_{\mathrm{final}}$ is the ratio of elapsed simulation time per wall clock time. On analog computers, this allows to identify the maximum frequency $\mathit{\nu}={k}_{\mathrm{0}}/\left(\mathrm{2}\mathit{\pi}\phantom{\rule{0.125em}{0ex}}\mathrm{s}\right)$. On digital computers, the timetosolution is used as an estimator (in a statistical sense) for the average k_{0}. Relating this quantity to measures in numerical schemes is an important discussion point in this paper. Given the simplest possible ODE,
one can study the analog/digital computer performance in terms of the complexity of f(y). For a problem M times as big as the given one, the inherently fully parallel analog computer exhibits a constant timetosolution, i.e., in other terms,
In contrast, a single core (i.e., nonvectorized, nor superscalar architecture) digital computer operates in a serial fashion and can achieve a timetosolution
Here, T^{1} refers to the timetosolution for solving Eq. (1), while T^{M} refers to the timetosolution for solving a problem M times as hard. M∈ℕ is the measure for the algorithmic complexity of f(y). f(M)=𝒪(g(M)) refers to the BachmannLandau asymptotic notation. The number of computational elements required to implement f(y) on an analog computer or the number of instructions required for computing f(y) on a digital computer could provide numbers for M. This is because it is assumed that the evaluation of f(y) can hardly be numerically parallelized. For a system of N coupled ODEs $\mathrm{d}{y}_{i}/\mathrm{d}t={f}_{i}({y}_{\mathrm{1}},\mathrm{\dots},{y}_{N})$, the vectorvalued f can be assigned an effective complexity 𝒪(NM) with the same reasoning. However, an overall complexity 𝒪(M) is more realistic since parallelism could be exploited more easily in the direction of N (MIMD, multiple instruction, multiple data).
Furthermore, multistep schemes implementing higher order numerical time integration can exploit digital parallelization (however, in general the serial timetosolution of a numerical Euler scheme is the limit for the fastest possible digital time integration). Digital parallelization is always limited by the inherently serial parts of a problem (Amdahl's law, Rodgers, 1985), which makes the evaluation of f(y) the hardest part of the problem. Section 4 discusses complex functions f(y) in the context of the method of lines for PDEs.
It should be emphasized that, in the general case, this estimate for the digital computer is a most optimistic (best) estimate, using today's numerical methods. It does not take into account hypothetical algorithmic “shortcuts” which could archive solutions faster than 𝒪(M), because they imply some knowledge about the internal structure of f(y) which could probably also be exploited in analog implementations.
2.2 Power and energy scaling for the linear model
For a given problem with timetosolution T and average power consumption P, the overall energy is estimated by E=PT regardless of the computer architecture.
In general, an analog computer has to grow with the problem size M. Given constant power requirements per computing element and neglecting increasing resistances or parasitic capacitances, in general one can assume the analog computer power requirement ${P}_{A}^{M}$ for a size M problem to scale from a size 1 problem ${P}_{A}^{\mathrm{1}}$ as ${P}_{A}^{M}={P}_{A}^{\mathrm{1}}\cdot M$. In contrast, a serial single node digital computer in principle can compute a problem of any size serially by relying on dynamic memory (DRAM), i.e., ${P}_{\mathrm{D}}^{M}={P}_{\mathrm{D}}^{\mathrm{1}}$. That is, the digital computer power requirements for running a large problem (${P}_{\mathrm{D}}^{M}$) are (at first approximation) similar to running a small problem ${P}_{\mathrm{D}}^{\mathrm{1}}$. Typically, the DRAM energy demands are one to two orders of magnitude smaller than those of a desktop or server grade processor and are therefore negligible for this estimate.
Interestingly, this model suggests that the overall energy requirements to solve a large problem on an analog and digital computer, respectively, are both ${E}_{\mathrm{D}}^{M}$ and ${E}_{A}^{M}=\mathcal{O}\left(M\right)$, i.e., the analogdigital energy ratio remains constant despite the fact that the analog computer computes (runs) linearly faster with increasing problem size M. This can be easily deduced by $E=P\cdot T$. In this model, it is furthermore
The orthogonal performance features of the fullyparallel analog computer and the fullyserial digital computer are also summarized in Table 1.
When comparing digital and analog computer power consumption, the power consumption under consideration should include the total computer power including administrative parts (like network infrastructure, analogtodigital converters or cooling) and power supplies. In this work, data of heterogenous sources are compared and definitions may vary.
2.3 Criticism and outlook
Given that the digital and analog technology (electric representation of information, transistorbased computation) is quite similar, the model prediction of a similarly growing energy demand is useful. Differences are of course hidden in the constants (prefactors) of the asymptotic notation 𝒪(M). Quantitative studies in the next sections examine this prefactor in 𝒪(M).
The linear model is already limited in the case of serial digital processors when the computation gets memory bound (instead of CPUbound). Having to wait for data leads to a performance drop and might result in a worsened superlinear ${T}_{\mathrm{D}}^{M}$.
Parallel digital computing as well as serial analog computing has not yet been subject of the previous discussion. While the first one is a widespread standard technique, the second one refers to analogdigital hybrid computing which, inter alia, allows a small analog computer to be used repeatedly on a large problem, effectively rendering the analog part as an analog accelerator or coprocessor for the digital part. Parallel digital computing suffers from a theoretical speedup limited due to the nonparallel parts of the algorithm (see also Gustafson, 1988), which has exponential impact on ${T}_{\mathrm{D}}^{M}$. This is where the intrinsically parallel analog computer exhibits its biggest advantages. Section 4 discusses this aspect of analog computing.
In this section, quantitative measurements between contemporary analog and digital computers will be made. We use the Analog Paradigm Model1 computer (Ulmann, 2019, 2020), a modern modular academic analog computer and an ordinary Intel© Whiskey Lake “ultralow power mobile” processor (Core i78565U) as a representative of a typical desktopgrade processor. Within this experiment, we solve a simple^{1} test equation ${\mathrm{d}}^{\mathrm{2}}y/\mathrm{d}{t}^{\mathrm{2}}=\mathit{\lambda}y$ (with realvalued y and $\mathit{\lambda}=\pm \mathrm{1}$) on both a digital and analog computer.
3.1 Time to solution
The digital computer solved the simple ordinary differential equation (ODE) with simple textbook level scalar benchmark codes written in C and Fortran and compiled with GCC. Explicit (forward) integrator methods are adopted (Euler/RungeKutta). The algorithm computed $N=\mathrm{2}\times {\mathrm{10}}^{\mathrm{3}}$ timesteps with timestep size $\mathrm{\Delta}t=\mathrm{5}\times {\mathrm{10}}^{\mathrm{4}}$ each (see also Sect. 4 for a motivation for this time step size). Therefore, it is ${t}_{\mathrm{final}}=N\mathrm{\Delta}t=\mathrm{1}$. No output^{2} was written during the benchmark to ensure the best performance. The time per element update (per integration step) was roughly (45±35) ns. For statistical reasons, the computation was repeated and averaged 10^{5} times. Depending on the order of the integration scheme, the overall wall clock time was determined as ${T}_{\mathrm{D}}=(\mathrm{75}\pm \mathrm{45})$ µs in order to achieve the simulation time t_{final}.
In contrast, the equation was implemented with integrating (and negating, if $\mathit{\lambda}=\mathrm{1}$) operational amplifiers on the Analog Paradigm Model1. The machine approached t_{final}=1 in a wallclock time ${T}_{A}=\mathrm{1}\phantom{\rule{0.125em}{0ex}}\mathrm{s}/{k}_{\mathrm{0}}$ with ${k}_{\mathrm{0}}\in \mathit{\{}\mathrm{1},\mathrm{10},{\mathrm{10}}^{\mathrm{2}},{\mathrm{10}}^{\mathrm{3}},{\mathrm{10}}^{\mathrm{4}}\mathit{\}}$ the available integration speed factors on the machine (Ulmann, 2019). The Analog Paradigm Model1 reached the solution of ${y}^{\prime \prime}=y$ at t_{final}=1 in a wallclock time T_{A}=100 µs at best.
Note how ${T}_{A}/{T}_{\mathrm{D}}\approx \mathrm{1}$, i.e., in the case of the smallest possible reasonable ODE, the digital computer (2020s energy efficient desktop processor) is roughly as fast as the Analog Paradigm Model1 (modern analog computer with an integration level comparable to the 1970s).
Looking forward, given the limited increase in clock frequency, with a faster processor one can probably expect an improvement of T_{D} down to the order of 1 µs. For an analog computer on a chip, one can expect an improvement of T_{A} down to the order of 1 µs–10 ns. This renders ${T}_{A}/{T}_{\mathrm{D}}\approx {\mathrm{10}}^{(\mathrm{1}\pm \mathrm{1})}$ as a universal constant.
Summing up, with the given numbers above, as soon as the problem complexity grows, the analog computer outperforms the digital one, and this advantage increases linearly.
3.2 Energy and power consumption
The performance measure codes
likwid
(Hager et al., 2010; Röhl et al., 2017; Gruber et al., 2020)
and perf
(de Melo, 2010)
were used in order to measure
the overall floatingpoint operations (FLOP) and energy usage of the digital processor.
For the Intel mobile processor, this provided a power consumption of P_{D}=10 W during computing.
This number was derived directly from the CPU performance counters.
The overall energy requirement was then ${E}_{\mathrm{D}}={P}_{\mathrm{D}}{T}_{\mathrm{D}}=(\mathrm{0.9}\pm \mathrm{0.6})$ mJ.
Note that this number only takes the processor energy demands into account, not any other auxiliary
parts of the overall digital computer (such as memory, main board or power supply). For the overall
power consumption, an increase of at least 50 % is expected.
The analog computer energy consumption is estimated as P_{A}≈400 mW. The number is based on measurements of actual Analog Paradigm Model1 computing units, in particular 84 mW for a single summer and 162 mW for a single integrator. The overall energy requirement is then ${E}_{A}={P}_{A}{T}_{A}=\mathrm{40}\phantom{\rule{0.125em}{0ex}}$µJ.
Note that ${P}_{\mathrm{D}}/{P}_{A}\approx \mathrm{25}$, while ${E}_{\mathrm{D}}/{E}_{A}\approx (\mathrm{2.25}\pm \mathrm{1.5})$. The conclusion is that the analog and digital computer require a similar amount of energy for the given computation, a remarkable result given the 40year technology gap between the two architectures compared here.
For power consumption, it is hard to give a useful projection due to the accumulating administrative overhead in case of parallel digital computing, such as data transfers, nonuniform memory accesses (NUMA) and switching networking infrastructure. It can be assumed that this will change the ratio ${E}_{\mathrm{D}}/{E}_{A}$ further in favor of the analog computer for both larger digital and analog computers. Furthermore, higher integration levels lower E_{A}: the Analog Paradigm Model1 analog computer is realized with an integration level comparable with 1970s digital computers. We can reasonably expect a drop of two to three orders of magnitude in power requirements with fully integrated analog computers.
3.3 Measuring computational power: FLOP per Joule
For the digital computer, the number of computed floatingpoint operations (FLOP^{3}) can be measured. The overall single core nonvectorized performance was measured as $F\approx \mathrm{1}\phantom{\rule{0.125em}{0ex}}\mathrm{GFLOP}/\mathrm{s}$. A single computation until t_{final} required roughly F_{D}=3 kFLOP. The ratio ${F}_{\mathrm{D}}/{P}_{\mathrm{D}}=\mathrm{100}\phantom{\rule{0.125em}{0ex}}\mathrm{MFLOP}/\mathrm{J}$ is a measure of the number of computations per energy unit on this machine. This performance was one to two orders less than typical HPC numbers. This is because an energysaving desktop CPU and not a highend processor was benchmarked. Furthermore, this benchmark was by purpose singlethreaded.
In this nonvectorized benchmark, the reduced resolution of the analog computer was ignored. In fact it is slightly lower than an IEEE 754 half precision floatingpoint, compared to the double precision floatingpoint numbers in the digital benchmark. One can then assign the analog computer a timeequivalent floatingpoint operation performance
The analog computer FLOPperJoule ratio (note that $\mathrm{FLOP}/\mathrm{J}=\mathrm{FLOPs}/\mathrm{W}$) is
That is, the analog computer's “FLOP per Joule” is slightly larger than for the digital one. Furthermore, one can expect an increase of ${F}_{A}/{E}_{A}$ by 10–100 for an analog computer chip. See for instance Cowan (2005) and Cowan et al. (2005), who claim $\mathrm{20}\phantom{\rule{0.125em}{0ex}}\mathrm{GFlop}/\mathrm{s}$. We expect $\mathrm{300}\phantom{\rule{0.125em}{0ex}}\mathrm{GFlop}/\mathrm{s}$ to be more realistic, thought (Table 2).
Keep in mind that the FLOP/s or FLOP/J measures are (even in the case of comparing two digital computers) always problem/algorithmspecific (i.e., in this case a Runge Kutta solver of ${y}^{\prime \prime}=y$) and therefore controversial as a comparative figure.
This section presents forecasts about the solution of large scale differential equations. No benchmarks have been carried out, because a suitable integrated analog computer on chip does not yet exist. For the estimates, an analog computer on chip with an average energy consumption of about P_{N}=4 mW per computing element (i.e., per integration, multiplication, etc.) and maximum frequency ν=100 Mhz, which is refered to as the analog maximum frequency ν^{A} in the following, was assumed.was assumed.^{4} These numbers are several orders of magnitude better than the P_{N}=160 mW and ν=100 kHz of the Analog Paradigm Model1 computer discussed in the previous section. For the digital part, different systems than before are considered.
In general, the bandwidth of an analog computer depends on the frequency response characteristics of the elements, such as summers and integrators. The actual achievable performance also depends on the technology. A number of examples shall be given to motivate our numbers: In 65 nm CMOS technology, bandwidths of over 2 GHz are achievable with integrators (Breems et al., 2016). At unitygain frequencies of 800 MHz to 1.2 Ghz and power consumption of less than 2 mW, integrators with a unitygain frequency of 400 Mhz are achievable (Wang et al., 2018).
4.1 Solving PDEs on digital and analog computers
Partial differential equations (PDEs) are among the most important and powerful mathematical frameworks for describing dynamical systems in science and engineering. PDE solutions are usually fields $\mathit{u}=\mathit{u}(\mathit{r},t)$, i.e., functions^{5} of spatial position r and time t. In the following, we concentrate on initial value boundary problems (IVBP). These problems are described by a set of PDEs valid within a spatial and temporal domain and complemented with field values imposed on the domain boundary. For a review of PDEs, their applications and solutions see for instance Brezis and Browder (1998). In this text, we use computational fluid dynamics (CFD) as a representative theory for discussing general PDE performance. In particular, classical hydrodynamics (Euler equation) in a fluxconservative formulation is described by hyperbolic conservation laws in the next sections. Such PDEs have a long tradition of being solved with highly accurate numerical schemes.
Many methods exist for the spatial discretization. While finite volume schemes are popular for their conservative properties, finite difference schemes are in general cheaper to implement. In this work, we stick to simple finite differences on a uniform grid with some uniform grid spacing Δr. The evolution vector field u(r,t) is sampled on G grid points per dimension and thus replaced by u_{k}(t) with $\mathrm{0}\le k<G$. It is worthwhile to mention that this approach works in classical orthogonal “dimension by dimension” fashion, and the number of total grid points is given by G^{D}. The computational domain is thus bound by $\mathrm{\Omega}=[{\mathit{r}}_{\mathrm{0}},{\mathit{r}}_{G}{]}^{\mathrm{D}}$. A spatial derivative ∂_{i}f is then approximated by a central finite difference scheme, for instance ${\partial}_{i}{f}_{k}\approx ({f}_{k+\mathrm{1}}{f}_{k\mathrm{1}})/\left(\mathrm{2}\mathrm{\Delta}x\right)+\mathcal{O}\left(\mathrm{\Delta}{x}^{\mathrm{2}}\right)$ for a second order accurate central finite difference approximation of the derivative of some function f at grid point k.
Many algorithmic solvers implement numerical schemes which exploit the vertical method of lines (MoL) to rewrite the PDE into coupled ordinary differential equations (ODEs). Once applied, the ODE system can be written as ${\partial}_{\mathrm{t}}{u}^{k}={G}^{k}(\mathit{u},\mathbf{\nabla}\mathit{u})$ with u^{k} denoting the time evolved (spatial) degrees of freedom and G^{k} functions containing spatial derivatives (∂_{i}u^{j}) and algebraic sources. A standard time stepping method determines a solution u(t_{1}) at later time t_{1}>t_{0} by basically integrating ${u}^{k}\left({t}_{\mathrm{1}}\right)={\int}_{{t}_{\mathrm{0}}}^{{t}_{\mathrm{1}}}{G}^{k}\left(\mathit{u}\right(t\left)\right)\mathrm{d}\phantom{\rule{0.125em}{0ex}}t+{u}^{k}\left({t}_{\mathrm{0}}\right)$. Depending on the details of the scheme, G^{k} is evaluated (probably repeatedly or in a weakform integral approach) during the time integration of the system. However, note that other integration techniques exist, such as the arbitrary high order ADER technique (Titarev and Toro, 2002, 2005). The particular spatial discretization method has a big impact on the computational cost of G^{i}. Here, we focus on the (simplest) finite difference technique, where the number of neighbor communications per dimension grows linearly with the convergence order of the scheme.
4.2 Classical Hydrodynamics on analog computers
The broad class of fluid dynamics will be discussed as popular yet simple type of PDEs. It is well known for its efficient description of the flow of liquids and gases in motion and is applicable in many domains such as aerodynamics, in life sciences as well as fundamental sciences (Sod, 1985; Chu, 1979; Wang et al., 2019). In this text, the simplest formulation is investigated: the Newtonian hydrodynamics (also refered to as Euler equations) with an ideal gas equation of state. It is given by a nonlinear PDE describing the time evolution of a mass density ρ, it's velocity v^{i}, momentum p^{i}=ρv^{i} and energy $e=t+\mathit{\epsilon}$, with the kinetic contribution $t=\mathit{\rho}\phantom{\rule{0.25em}{0ex}}{\mathit{v}}^{\mathrm{2}}/\mathrm{2}$ and an “internal” energy ε, which can account for forces on smaller length scales than the averaged scale.
Flux conservative Newtonian hydrodynamics with an ideal gas equation of state are one of the most elementary and textbook level formulations of fluid dynamics (Toro, 1998; Harten, 1997; Hirsch, 1990). The PDE system can be written in a dimension agnostic way in D spatial dimensions (i.e., independent of the particular choice for D) as
with $i,j\in [\mathrm{1},D]$. Here, the pressure $p=\mathit{\rho}\mathit{\epsilon}(\mathrm{\Gamma}\mathrm{1})$ defines the ideal gas equation of state, with adiabatic index Γ=2 and δ^{ij} is the Kronecker delta. A number of vectors are important in the following: The integrated state or evolved vector u in contrast to the primitive state vector or auxiliary quantities $\mathit{v}\left(u\right)=(p,{v}^{i})$, which is a collection of so called locally reconstructed quantities. Furthermore, the right hand sides in Eq. (7) do not explicitly depend on the spatial derivative ∂^{i}ρ, thus the conserved flux vector $\mathit{f}=\mathit{f}(\mathrm{\nabla}\mathit{q},\mathit{v})$ is only a function of the derivatives of the communicated quantities $\mathit{q}=(e,{p}^{i})$ and the auxiliaries v. Furthermore, q and v are both functions of u only.
S=0 is a source term. Some hydrodynamical models can be coupled by purely choosing some nonzero S, such as the popular Navier Stokes equations which describe viscous fluids. Compressible Navier Stokes equations can be written with a source term $\mathit{S}=\mathbf{\nabla}\cdot {\mathit{F}}^{v}$, with
with specific heats c_{p}, c_{v}, viscosity coefficient μ, Prandtl number Pr and temperature T determined by the perfect gas equation of state, i.e., $T=(e{\mathit{v}}^{\mathrm{2}})/\left(\mathrm{2}{c}_{v}\right)$. The computational cost from Euler equation to Navier Stokes equation is roughly doubled. Furthermore, the partial derivatives on the velocities and temperatures also double the quantities which must be communicated with each neighbor in every dimension. We use Euler equations in the following section for the sake of simplicity.
4.3 Spatial discretization: Trading interconnections vs. computing elements
Schemes of (convergence) order F shall be investigated, which require the communication with F neighbour elements. For instance, a F=4th order accurate stencil has to communicate and/or compute four neighbouring elements ${\mathit{f}}_{k\mathrm{2}},{\mathit{f}}_{k\mathrm{1}},{\mathit{f}}_{k+\mathrm{1}},{\mathit{f}}_{k+\mathrm{2}}$. Typically, longterm evolutions are carried out with F=4 or F=6. In the following, for simplicity, second order stencil (F=2) is chosen. One identifies three different subcircuits
with ${\mathit{f}}_{k\pm \mathrm{1}}:={\mathit{f}}_{k}({\mathit{q}}_{k\pm \mathrm{1}},{\mathit{v}}_{k})$ and v_{k}:=v_{k}(u_{k}) according to their previous respective definitions. Figure 1 shows this “building block” for a single grid point, an exemplar for up to D=2 dimensions with an F=2nd order finite difference stencil. The circuit identifies a number of intermediate expressions which are labeled as these equations:
Just like in Fig. 1, all expressions which are vanishing in a single spatial dimension are colored in red. Furthermore, note how the index i denotes the xdirection and k the ydirection, and that there are different fluxes f^{j} in the particular directions. Equation (13) is closed with the elementlocal auxiliary recovery
Note that one can trade neighbor communication (i.e., number of wires between grid points) for local recomputation. For instance, it would be mathematically clean to communicate only the conservation quantities u and reconstruct v whenever needed. In order to avoid too many recomputations, some numerical codes also communicate parts of v. In an analog circuit, it is even possible to communicate parts of the finite differences, such as the Δv_{i,k} quantities in Eq. (13).
The number of analog computing elements required to solve the Euler equation on a single grid point is determined as ${N}_{\mathrm{single}}=\mathrm{5}D+\mathrm{5}F(D+\mathrm{2})+\mathrm{9}$, with D being the number of spatial dimensions and F the convergence order (i.e., basically the finite difference stencil size). Typical choices of interest are convergence orders of $F\in [\mathrm{2},\mathrm{6}]$ in $D\in [\mathrm{1},\mathrm{3}]$ spatial dimensions. Inserting the averaged $F=\mathrm{3}\pm \mathrm{1}$ and $D=\mathrm{2}\pm \mathrm{1}$ into N_{single} yields an averaged ${N}_{\mathrm{single}}\approx (\mathrm{84}\pm \mathrm{40})$ computing elements per spatial degree of freedom (grid point) required for implementing Euler equations.
Unfortunately, this circuit is too big to fit on the Analog Paradigm Model1 computer resources available. Consequently the following discussion is based on a future implementation using a large number of interconnected analog chips. It is noteworthy that this level of integration is necessary to implement large scale analog computing applications. With P_{N}=4 mW per computing element, the average power per spatial degree of freedom (i.e., single grid point) is ${P}_{N\mathrm{D}}=(\mathrm{336}\pm \mathrm{160})$ mW.
4.4 Time to solution
Numerical PDE solvers are typically benchmarked using a wallclock time per degree of freedom update measure T_{DOF}, where element update typically means a time integration timestep. In this measure, the overall wall clock time is normalized (divided) by the number of spatial degrees of freedom as well as the number of parallel processors involved.
The fastest digital integrators found in literature carry out a time per degree of freedom update ${T}_{\mathrm{DOF}}={\mathrm{10}}^{\mathrm{1}\pm \mathrm{1}}$ µs. Values smaller than 1 µs require already the use of sophisticated communication avoiding numerical schemes such as discontinuous Galerkin (DG) schemes.^{6} For instance, Dumbser et al. (2008) demonstrate the superiority of so called P_{N}P_{M} methods (polynomial of degree N for reconstruction and M for time integration, where the limit P_{0}P_{M} denotes a standard highorder finite volume scheme) by reporting T_{DOF}=0.8 µs for a P_{2}P_{2} method when solving twodimensional Euler equations. Diot et al. (2013) report an adaptive scheme which performs no faster than T_{EU}=30 µs when applied to threedimensional Euler equations. The predictorcorrector arbitraryorder ADER scheme applied by Köppel (2018) and Fambri et al. (2018) to the generalrelativistic magnetodynamic extension of hydrodynamics reported T_{DOF}=41 µs as the fastest speed obtained. The nonparallelizable evaluation of more complex hydrodynamic models is clearly reflected in the increasing times T_{DOF}.
Recalling the benchmark result of T_{DOF}∼45 ns from Sect. 3.1, the factor of 1000 is mainly caused by the inevitable communication required for obtaining neighbor values when solving f(y,∇y) in ${\partial}_{\mathrm{t}}y=f\left(y\right)$. Switched networks have an intrinsic communication latency and one cannot expect T_{DOF} to shrink significantly, even for newer generations of supercomputers. A key advantage of analog computing is that grid neighbor communication happens continuously in the same time as in the gridlocal circuit. That is, no time is lost for communication.
One can do a comparison with the analog computer without knowing the simulation time step size Δt. The reasoning is based on the maximum frequency, i.e., the shortest wavelength which can be resolved with a (first order in time^{7}) numerical scheme is ${f}_{\mathrm{sim}}:=\mathrm{1}/\left(\mathrm{10}\mathrm{\Delta}t\right)$, c.f., Fig. 2. The factor $\mathrm{10}=\mathrm{2}\cdot \mathrm{5}$ includes a factor of 2 due to the NyquistShannon sampling theorem, while the factor of 5 is chosen to take into account that a numerical scheme can marginally reconstruct a wave at frequency $f=\mathrm{1}/\left(\mathrm{2}\mathrm{\Delta}t\right)$ by two points while it can be obtained perfectly by the analog computer (down to machine precision without any artifacts). The integration of signals beyond the maximum frequency results in a nonlinear response which heavily depends on the electrical details of the circuit (which are beyond the scope of the analog computer architecture discussed in this paper). One can demand that the numerical integrator time resolution is good enough to reconstruct a signal without prior knowledge on the wave form even at the maximum frequency.^{8} This drives the demand for 5 additional sampling points per halfwave, in order to make analog and digital outcome comparable (see also Fig. 2).
It is noted that this argument is relevant as long as one is interested in obtaining and preserving the correct time evolution (of a system described by the differential equation) with an analog or digital computer, respectively. In general, it is not valid to reduce the computational correctness within the solution domain of an initial value problem as this will invalidate any later solution.
By assigning the numerical PDE solver a maximum frequency identical to the highest frequency which can be evolved by the scheme in a given time, one introduces an effective digital computer maximum frequency
Note how the mapping of simulation time (interval) Δt to wallclock time (interval) T_{DOF} results in a mapping of simulation frequency f_{sim} to wallclock (or realtime) frequency ν^{D} (Fig. 2).
The calculated ${\mathit{\nu}}^{\mathrm{D}}={\mathrm{10}}^{\mathrm{2}\pm \mathrm{1}}$ MHz has to be contrasted with ν^{A}=100 MHz of the analog computer chip. One can conclude that analog computers can solve large scale high performance computing at least ${\mathit{\nu}}^{A}/{\mathit{\nu}}^{\mathrm{D}}={\mathrm{10}}^{\mathrm{3}\pm \mathrm{1}}$ times faster than the digital ones, when T_{A} and T_{D} are the analog and digital time to solution. Since $T\sim \mathrm{1}/\mathit{\nu}$, the resolution time reduces accordingly and ${T}_{A}/{T}_{\mathrm{D}}={\mathrm{10}}^{\mathrm{3}\pm \mathrm{1}}$.
This is a remarkable result as it already assumes the fastest numerical integration schemes on a perfectly scaling parallel digital computer. In practical problems, these assumptions are hardly ever met: The impossibility of (ideal) parallelization is one of the major drawbacks of digital computing. Nevertheless, the above results show that even without these drawbacks, the analog computer is orders of magnitude faster. Notably, while it needs careful adjustment both the problem and the code for a highperformance computer to achieve acceptable parallel performance, when using an analog computer these advantages come effortless. The only way to reduce the speed or timing advantage is to choose a disadvantegeous or unsuitable number scaling.
In this study the low resolution of an analog computer (which is effectively IEEE 754 half precision floatingpoint) has been neglected. In fact, high order time integration schemes can invest computing time in order to achieve machine level accuracy which a typical error $\mathrm{\Delta}{f}_{\mathrm{digital}}\sim {\mathrm{10}}^{\mathrm{10}}$ on some evolved function or field f and an error definition $\mathrm{\Delta}{f}_{\mathrm{simulation}}:=({f}_{\mathrm{simulation}}{f}_{\mathrm{exact}})/{f}_{\mathrm{exact}}$. An analog computer is limited by its intrinsic accuracy with a typical error $\mathrm{\Delta}{f}_{\mathrm{analog}}\sim {\mathrm{10}}^{(\mathrm{4}\pm \mathrm{1})}$ (averaging over the Analog Paradigm Model1 and future analog computers on chip).
4.5 Energy and power consumption
One expects the enormous speedup ${T}_{A}/{T}_{\mathrm{D}}$ of the analog computer to result in a much lower energy budget ${E}_{\mathrm{D}}=({T}_{\mathrm{D}}/{T}_{A}){E}_{A}={\mathrm{10}}^{\mathrm{3}\pm \mathrm{1}}{E}_{A}$ for a given problem. However, as the power requirement is proportional to the analog computer size, P_{A}=NP_{ND}, the problem size (number of grid points) which can be handled by the analog computer is limited by the overall power consumption. For instance, with a typical high performance computer power consumption of P_{A}=20 MW, one can simultaneously evolve a grid with $N={P}_{A}/{P}_{N\mathrm{D}}={\mathrm{10}}^{\mathrm{11}\pm \mathrm{0.5}}$ points. This is in the same order of magnitude as the largest scale computational fluid dynamics simulations evolved on digital high performance computer clusters (c.f., Green 500 list, Subramaniam et al., 2013, 2020). Note that in such a setup, the solution is obtained on average 10^{3±1} times faster with a purely analog computer and consequently also the energy demand is 10^{3±1} times lower.
Just to depict an analog computer of this size: Given 1000 computing elements per chip, 1000 chips per rack unit, 40 units per rack still requires 2500 racks to build such a computer in a traditional design. This is one order of magnitude larger than the size of typical high performance centers. Clearly, at such a size the interconnections will also have a considerable power consumption, even if the monumental engineering challenges for such a large scale interconnections can be met. On a logical level, interconnections are mostly wires and switches (which require little power, compared to computing elements). This can change dramatically with level converters and an energy estimate is beyond the scope of this work.
4.6 Hybrid techniques for trading power vs. time
The analog computers envisaged so far have to grow with problem size (i.e., with grid size, but also with equation complexity). Modern chip technology could make it theoretically possible to build a computer with 10^{12} analog computing elements, which is many orders of magnitude larger than any analog computer that has been built so far (about 10^{3} computing elements at maximum). The idea of combining an analog and a digital computer thus forming a hybrid computer featuring analog and digital computing elements is not new. With the digital memory and algorithmically controlled program flow, a small analog computer can be used repeatedly on a larger problem under control of the digital computer it is mated to. Many attempts at solving PDEs on hybrid computers utilized the analog computer for computing the elementlocal updated state with the digital computer looping over the spatial degrees of freedom. In such a scheme, the analog computer fulfils the role of an accelerator or coprocessor. Such attempts are subject of various historical (such as Nomura and Deiters, 1968; Reihing, 1959; Vichnevetsky, 1968, 1971; Volynskii and Bukham, 1965; Bishop and Green, 1970; Karplus and Russell, 1971; Feilmeier, 1974) and contemporary studies (for instance Amant et al., 2014; Huang et al., 2017).
A simple backoftheenvelope estimation with a modern hybrid computer tackling the N=10^{11} problem is described below. The aim is to trade the sheer number of computing elements with their electrical power P, respectively, against solution time T. It is assumed that the analogdigital hybrid scheme works similarly to numerical parallelization: The simulation domain with N degrees of freedom is divided into Q parts which can be evolved independently to a certain degree (for instance in a predictorcorrector scheme). This allows to use a smaller analog computer which only needs to evolve $N/Q$ degrees of freedom at a time. While the power consumption of such a computer is reduced to ${P}_{A}\to {P}_{A}/Q$, the time to solution increases to T_{A}→QT_{A}. Of course, the overall required energy remains the same, ${E}_{A}={P}_{A}{T}_{A}=({P}_{A}/Q)\left(Q{T}_{A}\right)$.
In this simple model, energy consumption of the digital part in the hybrid computer as well as numerical details of the analogdigital hybrid computer scheme have been neglected. This includes the timetosolution overhead introduced by the numerical scheme implemented by the digital computer (negligible for reasonably small Q) and the power demands of the ADC/DAC (analogtodigital/digitaltoanalog) converters (an overhead which scales with $(D+\mathrm{2}){G}^{\mathrm{D}}/Q$, i.e., the state vector size per grid element).
Given a fixed four orders of magnitude speed difference ${\mathit{\nu}}^{\mathrm{D}}/{\mathit{\nu}}^{A}={\mathrm{10}}^{\mathrm{4}}$ and a given physical problem with grid size N=10^{11}, one can build an analogdigital hybrid computer which requires less power and is reasonably small so that the overall computation is basically still done in the analog domain and digital effects will not dominate. For instance, with Q chosen just as big as $Q={\mathit{\nu}}^{\mathrm{D}}/{\mathit{\nu}}^{A}$, the analog computer would evolve only $N/Q={\mathrm{10}}^{\mathrm{7}}$ points in time, but run 10^{4} times “in repetition”. The required power reduces from clustergrade to desktopgrade ${P}_{A}=(N/Q){P}_{N\mathrm{D}}=\mathrm{3.3}$ kW. The runtime advantage is of course lost, ${T}_{\mathrm{D}}/{T}_{A}=\left(Q{\mathit{\nu}}^{A}\right)/{\mathit{\nu}}^{\mathrm{D}}=\mathrm{1}$.
Naturally, this scenario can also be applied to solve larger problems with a given grid size. For instance, given an analog computer with the size of N=10^{11} grid points, one can solve a grid of size QN by succesively evolving Q parts of the computer with the same power P_{A} as for a grid of size N. Of course, the overall time to solution and energy will grow with Q. In any case, time and energy remain (3±1) orders of magnitude lower than for a purely digital computer solution.
In Sect. 2, we have shown the time and power needs of analog computers are orthogonal to those of digital computers. In Sect. 3, we performed an actual miniature benchmark of a commercially available Analog Paradigm Model1 computer versus a mobile Intel© processor. The results are remarkable in several ways: The modern analog computer Analog Paradigm Model1, uses integrated circuit technology which is comparable to the 1970s digital integration level. Nevertheless it achieves competitive results in computational power and energy consumption compared to a mature cuttingedge digital processor architecture which has been developed by one of the largest companies in the world. We also computed a problemdependent effective FLOP/s value for the analog computer. For the key performance measure for energyefficient computing, namely FLOPperJoule, the analog computer again obtains remarkable results.
Note that while FLOP/s is a popular measure in scientific computing, it is always application and algorithmspecific. Other measures exist, such as transversed edges per second (TEPS) or synaptic updates per second (SUPS). Cockburn and Shu (2001) propose for instance to measure the efficiency of a PDE solving method by computing the inverse of the product of the (spatialvolume integrated) L^{1}error times the computational cost in terms of timetosolution or invested resources.
In Sect. 4, large scale applications were discussed on the example of fluid dynamics and by comparing high performance computing results with a prospected analog computer chip architecture. Large scale analog applications can become powerbound and thus require the adoption of analogdigital hybrid architectures. Nevertheless, with their 𝒪(1) runtime scaling, analog computers excel for time integrating large coupled systems where algorithmic approaches suffer from communication costs. We predict outstanding advantages in terms of timetosolution when it comes to large scale analog computing. Given the advent of chiplevel analog computing, a gigascale analog computer (a device with ∼10^{9} computing elements) could become a game changer in this decade. Of course, major obstacles have to be addressed to realize such a computer, such as the interconnection toplogy and realization in an (energy) efficient manner.
Furthermore, there are a number of different approaches in the field of partial differential equations which might be even better suited to analog computing. For instance, solving PDEs with artificial intelligence has become a fruitful research field in the last decade (see for instance Michoski et al., 2020; Schenck and Fox, 2018), and analog neural networks might be an interesting candidate to challenge digital approaches. Number representation on analog computers can be nontrivial when the dynamical range is large. This is frequently the case with fluid dynamics, where large density fluctiations are one reason why perturbative solutions fail and numerical simulations are carried out in the first place. One reason why indirect alternative approaches such as neural networks could be better suited than direct analog computing networks is that this problem is avoided. Furthermore, the demand for high accuracy in fluid dynamics can not easily fulfilled by low resolution analog computing. In the end, it is quite possible that a smallsized analog neural network might outperform a largesized classical pseudolinear time evolution in terms of timetosolution and energy requirements. Most of these engineering challenges have not been discussed in this work and are subject to future studies.
The software code is available in the Supplement.
No data sets were used in this article.
The supplement related to this article is available online at: https://doi.org/10.5194/ars191052021supplement.
BU performed the analog simulations. SK carried out the numerical simulations and the estimates. DK implements the concept for an integrated analog computer. All authors contributed to the article.
The authors declare that they have no conflict of interest.
Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the special issue “Kleinheubacher Berichte 2020”.
We thank our anonymous referees for helpful comments and corrections. We further thank Chris Giles for many corrections and suggestions which improved the text considerably.
This paper was edited by Madhu Chandra and reviewed by two anonymous referees.
Amant, R., Yazdanbakhsh, A., Park, J., Thwaites, B., Esmaeilzadeh, H., Hassibi, A., Ceze, L., and Burger, D.: Generalpurpose code acceleration with limitedprecision analog computation, in: ACM/IEEE 41st International Symposium on Computer Architecture (ISCA), vol. 42, pp. 505–516, https://doi.org/10.1109/ISCA.2014.6853213, 2014. a
Bishop, K. and Green, D.: Hybrid Computer Impelementation of the Alternating Direction Implicit Procedure for the Solution of TwoDimensional, Parabolic, PartialDifferential Equations, AIChE Journal, 16, 139–143, https://doi.org/10.1002/aic.690160126, 1970. a
Bournez, O. and Pouly, A.: A Survey on Analog Models of Computation, in: Handbook of Computability and Complexity, Springer, Cham, pp. 173–226, https://doi.org/10.1007/9783030592349_6, 2021. a
Breems, L., Bolatkale, M., Brekelmans, H., Bajoria, S., Niehof, J., Rutten, R., OudeEssink, B., Fritschij, F., Singh, J., and Lassche, G.: A 2.2 GHz ContinuousTime Delta Sigma ADC With −102 dBc THD and 25 MHz Bandwidth, IEEE J. SolidSt. Circ., 51, 2906–2916, https://doi.org/10.1109/jssc.2016.2591826, 2016. a
Brezis, H. and Browder, F.: Partial Differential Equations in the 20th Century, Adv. Math., 135, 76–144, https://doi.org/10.1006/aima.1997.1713, 1998. a
Calude, C. S., Pa˘un, G., and Ta˘ta˘râm, M.: A Glimpse into natural computing. Centre for Discrete Mathematics and Theoretical Computer Science, The University of Auckland, New Zealand, available at: https://www.cs.auckland.ac.nz/research/groups/CDMTCS/researchreports/download.php?selectedid=93 (last access: 2 August 2021), 1999. a
Chu, C.: Numerical Methods in Fluid Dynamics, Adv. Appl. Mech., 18, 285–331, https://doi.org/10.1016/S00652156(08)702692, 1979. a
Cockburn, B. and Shu, C.W.: Runge–Kutta Discontinuous Galerkin Methods for ConvectionDominated Problems, J. Sci. Comput., 16, 173–261, https://doi.org/10.1023/A:1012873910884, 2001. a, b
Cowan, G., Melville, R. C., and Tsividis, Y. P.: A VLSI analog computer/math coprocessor for a digital computer, ISSCC Dig. Tech. Pap. I, San Francisco, CA, USA, 10–10 February 2005, vol. 1, pp. 82–586, https://doi.org/10.1109/ISSCC.2005.1493879, 2005. a
Cowan, G. E. R.: A VLSI analog computer/math coprocessor for a digital computer, PhD thesis, Columbia University, available at: http://www.cisl.columbia.edu/grads/gcowan/vlsianalog.pdf (last access: 2 August 2021), 2005. a
Dahlquist, G. and Jeltsch, R.: Generalized disks of contractivity for explicit and implicit RungeKutta methods, Royal Institute of Technology, Stockholm, Sweden, available at: https://www.sam.math.ethz.ch/sam_reports/reports_final/reports2008/200820.pdf (last access: 2 August 2021), 1979. a
Deaton, R., Garzon, M., Rose, J., Franceschetti, D., and Stevens, S.: DNA Computing: A Review, Fund. Inform., 35, 231–245, https://doi.org/10.3233/FI199835123413, 1998. a
de Melo, A. C.: The New Linux 'perf' Tools, Tech. rep., available at: http://vger.kernel.org/~acme/perf/lk2010perfpaper.pdf (last access: 27 July 2021), 2010. a
Diot, S., Loubère, R., and Clain, S.: The MOOD method in the threedimensional case: VeryHighOrder Finite Volume Method for Hyperbolic Systems, Int. J. Numer. Meth. Fl., 73, 362–392 https://doi.org/10.1002/fld.3804, 2013. a
Dumbser, M., Balsara, D. S., Toro, E. F., and Munz, C.D.: A unified framework for the construction of onestep finite volume and discontinuous Galerkin schemes on unstructured meshes, J. Comput. Phys., 227, 8209–8253, https://doi.org/10.1016/j.jcp.2008.05.025, 2008. a
Fambri, F., Dumbser, M., Köppel, S., Rezzolla, L., and Zanotti, O.: ADER discontinuous Galerkin schemes for generalrelativistic ideal magnetohydrodynamics, Mon. Not. R. Astron. Soc., 477, 4543–4564, https://doi.org/10.1093/mnras/sty734, 2018. a
Feilmeier, M.: Hybridrechnen, Springer, Basel, https://doi.org/10.1007/9783034854900, 1974. a
Georgescu, I. M., Ashhab, S., and Nori, F.: Quantum simulation, Rev. Mod. Phys., 86, 153–185, https://doi.org/10.1103/revmodphys.86.153, 2014. a
Gruber, T., Eitzinger, J., Hager, G., and Wellein, G.: LIKWID 5: Lightweight Performance Tools, Zenodo [data set], https://doi.org/10.5281/zenodo.4275676, 2020. a
Gustafson, J. L.: Reevaluating Amdahl′s law, Commun. ACM, 31, 532–533, https://doi.org/10.1145/42411.42415, 1988. a
Hager, G., Wellein, G., and Treibig, J.: LIKWID: A Lightweight PerformanceOriented Tool Suite for x86 Multicore Environments, in: 2010 39th International Conference on Parallel Processing Workshops, IEEE Computer Society, Los Alamitos, CA, USA, 13–16 September 2010, pp. 207–216, https://doi.org/10.1109/ICPPW.2010.38, 2010. a
Harten, A.: High resolution schemes for hyperbolic conservation laws, J. Computat. Phys., 135, 260–278, https://doi.org/10.1006/jcph.1997.5713, 1997. a
Hirsch, C.: Numerical computation of internal and external flows, in: Computational Methods for Inviscid and Viscous Flows, vol. 2, John Wiley & Sons, Chichester, England and New York, 1990. a
Huang, Y., Guo, N., Seok, M., Tsividis, Y., Mandli, K., and Sethumadhavan, S.: Hybrid analogdigital solution of nonlinear partial differential equations, in: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, ACM, 665–678, https://doi.org/10.1145/3123939.3124550, 2017. a
Karplus, W. and Russell, R.: Increasing Digital Computer Efficiency with the Aid of ErrorCorrecting Analog Subroutines, IEEE T. Comput., C20, 831–837, https://doi.org/10.1109/TC.1971.223357 1971. a
Kendon, V. M., Nemoto, K., and Munro, W. J.: Quantum analogue computing, Philos. T. R. Soc. A, 368, 3609–3620, https://doi.org/10.1098/rsta.2010.0017, 2010. a
Köppel, S.: Towards an exascale code for GRMHD on dynamical spacetimes, J. Phys. Conf. Ser., 1031, 012017, https://doi.org/10.1088/17426596/1031/1/012017, 2018. a
MacLennan, B. J.: Natural computation and nonTuring models of computation, Theor. Comput. Sci., 317, 115–145, https://doi.org/10.1016/j.tcs.2003.12.008, 2004. a
MacLennan, B. J.: Analog Computation, in: Computational Complexity, edited by: Meyers, R., Springer, New York, pp. 161–184, https://doi.org/10.1007/9781461418009_12, 2012. a
MacLennan, B. J.: Unconventional Computing, University of Tennessee, Knoxville, Tennessee, USA, available at: http://web.eecs.utk.edu/~bmaclenn/Classes/494594UC/handouts/UC.pdf (last access: 27 July 2021), 2019. a
Michoski, C., Milosavljević, M., Oliver, T., and Hatch, D. R.: Solving differential equations using deep neural networks, Neurocomputing, 399, 193–212, https://doi.org/10.1016/j.neucom.2020.02.015, 2020. a
Nomura, T. and Deiters, R.: Improving the analog simulation of partial differential equations by hybrid computation, Simulation, 11, 73–80, https://doi.org/10.1177/003754976801100207, 1968. a
Reihing, J.: A timesharing analog computer, in: Proceedings of the western joint computer conference, San Francisco, CA, USA, 3–5 March 1959, 341–349, https://doi.org/10.1145/1457838.1457904, 1959. a
Rodgers, D. P.: Improvements in Multiprocessor System Design, SIGARCH Comput. Archit. News, 13, 225–231, https://doi.org/10.1145/327070.327215, 1985. a, b
Röhl, T., Eitzinger, J., Hager, G., and Wellein, G.: LIKWID Monitoring Stack: A Flexible Framework Enabling Job Specific Performance monitoring for the masses, 2017 IEEE International Conference on Cluster Computing (CLUSTER), Honolulu, HI, USA, 5–8 September 2017, 781–784, https://doi.org/10.1109/CLUSTER.2017.115, 2017. a
Schenck, C. and Fox, D.: Spnets: Differentiable fluid dynamics for deep neural networks, in: Conference on Robot Learning, PMLR, 87, 317–335, available at: http://proceedings.mlr.press/v87/schenck18a.html (last access: 27 July 2021), 2018. a
Schuman, C. D., Potok, T. E., Patton, R. M., Birdwell, J. D., Dean, M. E., Rose, G. S., and Plank, J. S.: A Survey of Neuromorphic Computing and Neural Networks in Hardware, arXiv [preprint], arXiv:1705.06963v1, 19 May 2017. a
Shu, C.W.: High order WENO and DG methods for timedependent convectiondominated PDEs: A brief survey of several recent developments, J. Comput. Phys., 316, 598–613, https://doi.org/10.1016/j.jcp.2016.04.030, 2016. a
Siegelmann, H. T.: Computation Beyond the Turing Limit, Science, 268, 545–548, https://doi.org/10.1126/science.268.5210.545, 1995. a
Sod, G.: Numerical Methods in Fluid Dynamics: Initial and Initial BoundaryValue Problems, Cambridge University Press, Cambridge, UK, 1985. a
Subramaniam, B., Saunders, W., Scogland, T., and Feng., W.c.: Trends in EnergyEfficient Computing: A Perspective from the Green500, in: Proceedings of the International Green Computing Conference, Arlington, VA, USA, 27–29 June 2013, https://doi.org/10.1109/IGCC.2013.6604520, 2013. a
Subramaniam, B., Scogland, T., Feng, W.c., Cameron, K. W., and Lin, H.: Green 500 List, 2020, available at: http://www.green500.org (last access: 28 July 2021), 2020. a
Titarev, V. A. and Toro, E. F.: ADER: Arbitrary High Order Godunov Approach, J. Sci. Comput., 17, 609–618, https://doi.org/10.1023/A:1015126814947, 2002. a
Titarev, V. A. and Toro, E. F.: ADER schemes for threedimensional nonlinear hyperbolic systems, J. Comput. Phys., 204, 715–736, https://doi.org/10.1016/j.jcp.2004.10.028, 2005. a
Toro, E. F.: Primitive, Conservative and Adaptive Schemes for Hyperbolic Conservation Laws, in: Numerical Methods for Wave Propagation. Fluid Mechanics and Its Applications, edited by: Toro, E. F. and Clarke, J. F., vol. 47, Springer, Dordrecht, the Netherlands, pp. 323–385, https://doi.org/10.1007/9789401591379_14, 1998. a
Ulmann, B.: Model1 Analog Computer Handbook/User Manual, available at: http://analogparadigm.com/downloads/handbook.pdf (last access: 28 July 2021), 2019. a, b
Ulmann, B.: Analog and Hybrid Computer Programming, De Gruyter Oldenbourg, Berlin, Boston, https://doi.org/10.1515/9783110662207, 2020. a, b
Vichnevetsky, R.: A new stable computing method for the serial hybrid computer integration of partial differential equations, in: Spring Joint Computer Conference, Atlantic City New Jersey, 30 April–2 May 1968, 143–150, https://doi.org/10.1145/1468075.1468098, 1968. a
Vichnevetsky, R.: Hybrid methods for partial differential equations, Simulation, 16, 169–180, 1971. a
Volynskii, B. A. and Bukham, V. Y.: Analogues for the Solution of BoundaryValue Problems, 1st edn., in: International Tracts in Computer Science and Technology and Their Application, Oxford, London, 1965. a
Wang, W., Zhu, Y., Chan, C.H., and Martins, R. P.: A 5.35mW 10MHz SingleOpamp ThirdOrder CT Delta Sigma Modulator With CTC Amplifier and Adaptive Latch DAC Driver in 65nm CMOS, IEEE J. SolidSt. Circ., 53, 2783–2794, https://doi.org/10.1109/jssc.2018.2852326, 2018. a
Wang, Y., Yu, B., Berto, F., Cai, W., and Bao, K.: Modern numerical methods and theirapplications in mechanical engineering, Adv. Mech. Eng., 11, 1–3, https://doi.org/10.1177/1687814019887255, 2019. a
Wilhelm, F., Steinwandt, R., Langenberg, B., Liebermann, P., Messinger, A., Schuhmacher, P., and MisraSpieldenner, A.: Status of quantum computer development, Version 1.2, BSI Project Number 283, Federal Office for Information Security, available at: https://www.bsi.bund.de/SharedDocs/Downloads/DE/BSI/Publikationen/Studien/Quantencomputer/P283_QC_StudieV_1_2.html (last access: 28 July 2021), 2020. a
Zhou, Y., Stoudenmire, E. M., and Waintal, X.: What Limits the Simulation of Quantum Computers?, Phys. Rev. X, 10, 041038, https://doi.org/10.1103/physrevx.10.041038, 2020. a
Ziegler, M.: Novel hardware and concepts for unconventional computing, Sci. Rep., 10, 11843, https://doi.org/10.1038/s41598020688341, 2020. a
This equation is inspired by the Dahlquist and Jeltsch (1979) test equation ${y}^{\prime}=\mathit{\lambda}y$ used for stability studies. The advantage of using an oscillator is the selfsimilarity of the solution which can be observed over a long time.
Both in terms of dense output or any kind of evolution tracking. A textbooklevel approach with minimal memory footprint is adopted which could be considered an inplace algorithm.
sic! We either argue with overall FLOP and Energy (Joule) or per second quantities such as FLOP/s (in short FLOPS) and Power (Watt). In order to avoid confusion, we avoid the abbreviation “FLOPS” in the main text. Furthermore, SI prefixes are used, i.e., kFLOP=10^{3} FLOP, MFLOP=10^{6} FLOP and GFLOP=10^{9} FLOP.
Summation will be done implicitly on chip by making use of Kirchhoff's law (current summing) so that no explizit computing element are required for this operation.
The explicit dependency on r and t is omitted in the following text.
h−p methods, which provide both mesh refinement in grid spacing h as well as a “local” high order description typically in some function base expansion of order p. For reviews, see for instance Cockburn and Shu (2001) or Shu (2016).
For a high order time integration scheme, the cutoff increases formally linearly as ${f}_{\mathrm{0}}\sim p/\left(\mathrm{10}{T}_{\mathrm{DOF}}\right)$. That is, for a fourth order scheme, the digital computer is effectively four times faster in this comparison.
Note that on a digital computer, the maximum frequency is identical to a cutoff frequency (also refered to as ultraviolet cutoff). On analog computers, there is no such hard cutoff as computing elements tend to be able to compute with decreased quality at higher frequencies.
 Abstract
 Introduction
 A Simple (Linear) Model for Comparing Analog and Digital Performance
 A performance survey on solving ordinary differential equations (ODEs)
 PDEs and many degrees of freedom
 Summary and outlook
 Code availability
 Data availability
 Author contributions
 Competing interests
 Disclaimer
 Special issue statement
 Acknowledgements
 Review statement
 References
 Supplement
 Abstract
 Introduction
 A Simple (Linear) Model for Comparing Analog and Digital Performance
 A performance survey on solving ordinary differential equations (ODEs)
 PDEs and many degrees of freedom
 Summary and outlook
 Code availability
 Data availability
 Author contributions
 Competing interests
 Disclaimer
 Special issue statement
 Acknowledgements
 Review statement
 References
 Supplement