Introduction

Non-invasive mechanical ventilation (NIV) is a technique of ventilatory assistance in which the endotracheal tube is replaced by a non-invasive interface. NIV has been proved to be effective in reducing the need for endotracheal intubation and has accordingly gained popularity. Consequent to the increasing use of NIV, manufacturers continuously market new devices, both ventilators and interfaces, for NIV application. Several bench studies have been concomitantly published, which evaluate and compare the performance of NIV devices. Bench studies are a useful tool to help clinicians to be aware of the difference in performance between devices and likely influence, in association with other features such as user-friendliness [1], the choice in NIV equipment acquisition. No recommendation or specific requirement, however, exists for these evaluations, whose results are not infrequently inconsistent and sometimes even conflicting between different studies.

In the present article we review and discuss the techniques utilized in evaluating devices for NIV, illustrating those aspects that may explain discrepancies. We focus our attention on the following technical aspects: (1) lung models and simulation of inspiratory demand and effort; (2) mechanical properties of the virtual respiratory system; (3) generation and quantification of air leaks; (4) ventilator modes and settings; (5) assessment of the interface-ventilator unit performance, based on the evaluation of (a) inspiratory trigger, (b) inspiration-to-expiration (I:E) cycling, and (c) volume delivery or (d) inspiratory assistance and rate of pressurization. We do not consider the studies evaluating devices for delivery of continuous positive airway pressure (CPAP).

Lung models and simulated demand and effort

To simulate spontaneous breathing four types of lung models are used: (1) two-chamber lung model driven by a ventilator [217], (2) electrically driven pneumatic lung simulator [11, 18], (3) bellows-in-a-box lung model, in which sub-atmospheric pressure is generated by either a jet flow determining a Venturi effect [1923] or a pump [2426], (4) microprocessor-controlled piston [2732]. The pressure–time profile varies between lung models: digitally controlled simulators allow full control of the breath profile, whereas with the other models the shape of the pressure–time profile depends on the external pressure generator. When assessing the actually delivered volume, a passive lung model with adjustable compliance and resistance is also utilized [3335].

Inspiratory demand, when reported, is defined by either the peak inspiratory flow rate (30–120 L/min) [8, 15, 1822, 26, 29], the inspiratory flow at 100 ms [14], or the drop in airway pressure (P aw) occurring after 100 ms when occluding the airway opening, referred to as P 0.1 [3, 7, 8, 12, 14, 15, 23, 31]. The extent of the simulated effort is described by either the plateau pressure on the driving ventilator, when using a two-chamber lung model [9, 10, 13], or the negative pressure in the box, with the bellows-in-a-box model [24, 26], or the maximum inspiratory pressure drop generated by the digitally controlled simulator [28, 31, 32]. Several studies evaluate multiple levels of demand and effort [8, 9, 15, 1821, 23, 29], but only a few values are identical to allow full comparison of results [1921].

Differences between lung models and diverse forms of inspiratory effort generation may end up in conflicting results. Two studies performed with different lung simulators produced conflicting results when comparing the triggering performance of the same two ventilators [15, 29]. Two other studies using different lung models and modalities of simulated effort generation also resulted in opposite results in assessing and comparing triggering function of the same ventilators [8, 22]. As shown in Fig. 1, inspiratory demand remarkably affects triggering performance, with respect to both delay and sensitivity. Inspiratory demand and extent of the simulated effort influence the rate of pressurization [8, 12, 19, 29].

Fig. 1
figure 1

Simulation of the effects of varying inspiratory demand on trigger delay and sensitivity. At higher inspiratory demand the trigger delay is shorter, while the magnitude of the negative deflection in airway pressure (P aw) is greater. Quite the opposite occurs at a lower inspiratory demand, i.e., longer trigger delay and smaller P aw negative deflection

Whereas the impact of using different test lung models on the results is unclear and deserves elucidation, the simulated demand and effort greatly influence the performance of the ventilator and should definitely be standardized.

Mechanical properties of the “virtual” respiratory system

The mechanical properties of the “virtual” patient’s respiratory system are set to mimic either obstructive and/or restrictive disorders, or normal lungs. The values of resistance for the “obstructive” setting range from 10 to 50 cmH2O/L/s [57, 9, 10, 13, 16, 27, 2931, 34, 35], whereas the values of compliance simulating “restriction” vary between 20 and 60 mL/cmH2O [5, 7, 911, 13, 16, 30, 34, 35]. In several studies multiple settings are evaluated within the same study protocol [57, 911, 13, 14, 16, 17, 19, 27, 30, 34, 35]. In their lung model, Fauroux et al. [14] set the breathing pattern, respiratory mechanics, and inspiratory demand at values previously obtained from measurements performed on patients.

Since the performance of the ventilators is remarkably influenced by the severity of the impairment in respiratory mechanics [5, 14, 17], the use of standard reference values for respiratory system resistance and compliance would definitely facilitate comparisons among studies. Values of resistance of 5, 10, 20, and 50 cmH2O/L/s, for instance, might be used to characterize absent, moderate, severe, and extreme obstruction, respectively; similarly, values of compliance of 100, 50, and 25 mL/cmH2O might be used to mimic absent, mild, and severe restriction.

Air leaks

Several studies evaluate the effects of unintentional air leaks on the performance of ventilators in delivering NIV [5, 6, 10, 13, 14, 17, 2427, 31, 34]. The extent of the air leaks remarkably varies among studies, ranging between 6 L/min [10] and 120 L/min [6]. Although leak valve modules [14, 17, 31] and resistors [6, 34] have been utilized in bench simulations to mimic unintentional air leaks, these are more frequently obtained through an orifice placed in the circuit or the interface [5, 10, 13, 2427], this latter approach being definitely easy and apparently as efficient as the others. Irrespective of the technique utilized to generate unintentional air leaks, bench simulations mimic the clinical setting regarding the influence of the extent of the inspiratory support on leaks [10, 16, 26, 30] and the consequent variations throughout the respiratory cycle [27], but generally do not consider the complex relationship between air leak and mask fit [36].

Ventilator modes and settings

Flow-cycled pressure-targeted modes, i.e., pressure support ventilation (PSV) or bi-level positive airway pressure (BiPAP), are evaluated in the large majority of studies [2, 3, 813, 15, 1823, 2529, 31, 32]. One [3, 5, 13, 22, 2628, 3032], two [18, 20, 23], three [2, 8, 10, 12, 16, 19, 21], or four [9, 11, 15] support levels, ranging from 5 to 23 cmH2O, have been tested.

On the one hand, assessment of the interface-ventilator unit performance during pressure-targeted modes considers primarily the synchrony between the lung model and the ventilator (i.e., inspiratory trigger and I:E cycling), the amount of assistance provided throughout inspiration, and the speed of achievement of the preset inspiratory pressure, i.e., rate of pressurization. On the other hand, studies considering volume-targeted modes, essentially volume assist/control (VA/C) [6, 14, 17, 24, 27, 29, 33, 35], evaluate, in addition to synchrony [14, 27, 29], the rapport between preset and actually delivered volume [6, 14, 17, 24, 33, 35], considering the impact of either adding air leaks [6, 17, 24] or varying respiratory mechanics on this relationship [17, 33, 35]. Positive end expiratory pressure (PEEP) is generally set at 5 cmH2O [8, 13, 15, 18, 19, 2123, 26, 28, 29, 32] or at the minimal value allowed by the ventilator tested [2, 9, 10]. PEEP values of 5 [18, 28, 32] or 8 cmH2O [11] have been used in studies evaluating helmets.

Some studies are specifically designed to assess the effects of varying inspiratory trigger sensitivity [3, 6], pressurization rate [22, 23, 28, 32], and I:E cycling criteria [22, 28, 32]. These investigations apart, the inspiratory trigger setting is frequently defined as maximum sensitivity not determining auto-triggering [2, 7, 9, 12, 14, 16, 17, 21, 22, 25, 26, 28, 30, 32], whereas the fastest pressurization rate is commonly utilized for bench testing [2, 9, 12, 14, 21, 25]. I:E cycling criteria are in general maintained at the default values proposed by the manufacturers [8, 17, 21, 23], which are not necessarily equal among devices.

The choice of a combination of standard settings within a range of predefined values for PEEP (e.g., 5, 10, and 15 cmH2O) and inspiratory pressure (e.g., 5, 10, 15, and 20 cmH2O), and a clear definition of the inspiratory and expiratory (i.e., I:E cycling criteria) trigger would help to compare results.

Other ventilatory modes have been tested in very few bench studies: proportional assist ventilation (PAV) was tested in one study aimed at evaluating triggering performance of different ventilators [27], and the ability to deliver the target minute ventilation with changes of compliance and resistance and with the addition of air leaks was evaluated during volume-assured non-invasive PSV [34]. The currently available lung models do not allow bench evaluation of neurally adjusted ventilatory assist (NAVA).

Assessment of interface-ventilator unit performance

Inspiratory trigger

The assessment of inspiratory trigger function includes evaluation of (1) synchronization between simulated inspiratory effort and onset of ventilator assistance, and (2) trigger sensitivity. Synchronization is evaluated by assessing the time lag between the onset of the simulated effort and either (1) the initial delivery of flow [3, 30], (2) the lowest airway pressure (P aw) attained during triggering [7, 915, 2023, 26, 2832], as described in Fig. 2 by the interval AB on the x-axis, or (3) the point at which P aw returns to the preset expiratory pressure [8, 11, 15, 16, 19, 2123, 25, 27, 29, 31], as also depicted in Fig. 2 by the interval AC on the x-axis (time). In order to avoid computational problems arising from small fluctuations in baseline pressure, one study considers 3 cmH2O above preset expiratory pressure as return to baseline [27]. A uniform terminology is missing for these time intervals, which are indistinctly defined as “delay time” [8, 1923, 27], “delay trigger” [813, 22, 28, 32], “trigger time” [16, 22, 29], “trigger time delay” [14, 25], “time to trigger” [31], “triggering delay” [15, 30], “delay PEEP” [11], “time delay” [7, 29], “time to baseline” [31], and “inspiratory delay” [15, 26].

Fig. 2
figure 2

Simulated tracings of airway pressure (P aw) and flow during a simulated breath. A and B indicate onset of effort and inspiratory support, respectively; C is the point of return to baseline P aw. D and E designate end of effort and ventilator assistance, respectively. Simulated patient’s inspiration is indicated by the dashed line in the lower part of the figure, whereas mechanical inspiration corresponds to the solid line in the upper portion of the figure. The dotted horizontal line indicates zero flow. The interval AB on the x-axis (time) represents the trigger delay, whereas it corresponds to trigger sensitivity on the y-axis (P aw). See text for further details

Trigger sensitivity is determined as the difference between baseline and nadir P aw, namely “inspiratory trigger pressure”, “pressure drop”, or “pressure fall”, corresponding in Fig. 2 to the interval AB on the y-axis (P aw) [7, 8, 1214, 19, 2123, 25, 26, 28, 29, 31]. To evaluate the “effort” spent to trigger the ventilator, several studies calculate the P aw–time product during the triggering phase (PTPt) [2, 3, 813, 2123, 28, 30]. PTPt is calculated from the onset of “effort” to either nadir P aw [3, 11, 12] (area ABX in Fig. 2) or return to P aw baseline [2, 810, 13, 21, 22, 28, 30] (area ABC in Fig. 2).

The assessment of inspiratory trigger function is affected by the measurement criteria. The triggering delays assessed by two studies adopting the same criteria to evaluate triggering performance of the same ventilators was no different between two studies adopting the same measurement criteria, even with different test lung simulators [19, 27]. Measuring the inspiratory trigger delay as the time lag between onset of effort and start of pressurization, i.e., nadir P aw (Fig. 2, interval AB), is the most straightforward approach, because the time lag between start of pressurization and return to P aw baseline (Fig. 2, interval BC) is influenced by the rate of pressurization. Determining the difference between baseline and nadir P aw is a relatively straightforward approach to define trigger sensitivity; it is worth reminding, however, that this value is affected by either the simulated inspiratory demand (Fig. 1), or the characteristics of both ventilator circuit such as length, compliance, and presence of a heat and moisture exchanger or other resistive elements [26], and NIV interface, such as compliance and internal volume [8, 19, 29].

When PTPt is measured from onset of inspiratory effort to return to PEEP, the area obtained (area a in Fig. 3) is affected by the rate of pressurization [11], as illustrated in Fig. 4. Because PTPt depends on a multiplicity of factors (i.e., trigger delay and sensitivity, inspiratory demand, and rate of pressurization), the meaning of this value necessitates interpretation and must be contextualized to avoid misleading conclusions.

Fig. 3
figure 3

The airway pressure (P aw)–time product of the whole breath (PTPaw) resulting from b + c + d − a, where area a corresponds to the airway pressure–time product during triggering (PTPt). P aw–time products of the first 300 ms (PTP300) and 500 ms (PTP500) correspond to areas b − a, and b + c − a, respectively. The dotted area indicates the ideal PTPaw, as would occur with an immediate pressurization at the preset inspiratory pressure. See text for further explanation

Fig. 4
figure 4

Simulated airway pressure–time curves depicting the effects of inspiratory trigger function (synchronization and sensitivity) and rate of pressurization on the airway pressure–time product during the initial 500 ms (PTP500). As described in Fig. 3, PTP500 is commonly obtained by subtracting the black area, corresponding to the airway pressure–time product during triggering (PTPt), from the gray area. Trigger function is either good (left) or poor (right), whereas the rate of pressurization worsens from top (A) to bottom (C). Both trigger function and rate of pressurization influence PTP500. At each level of trigger function PTP500 is higher in A, corresponding to excellent pressurization rate, diminishes at the intermediate level (B), and further decreases in C, simulating poor pressurization rate. At each rate of pressurization, PTP500 is higher in A 1, B 1, and C 1, corresponding to good trigger function, as opposed to in A 2, B 2, and C 2, corresponding to poor trigger function. Noteworthy, PTPt is also affected by the rate of pressurization, being reduced at improved pressurization. See text for further explanation

I:E cycling

The termination of mechanical insufflation, corresponding to the opening of the expiratory valve, is considered as either the point where P aw falls below the preset inspiratory pressure [19, 21, 22, 27, 30, 31] or the point where inspiratory flow drops to zero [9, 10, 13, 16, 28, 32]. The synchronization between end of simulated inspiratory effort and termination of machine insufflation is presented as either the time interval between end of effort and termination of support [16, 19, 21, 22, 27, 28, 3032] (interval DE in Fig. 2), or the ratio between this time interval and the length of the simulated inspiratory effort, i.e., [(AE) − (AD)/(AD)] × 100 [9, 10, 13]. These values can be either positive, in cases of delayed ventilator cycling-off, or negative, when the end of the ventilator assistance anticipates the end of the simulated inspiration. The time during which simulated effort and ventilator assistance are in phase is proposed as an index of patient–ventilator synchrony [28, 32], namely “time of assistance” [28] or “time of synchrony” [32]. Though never proposed, “time of synchrony” could be expressed as a fraction of overall duration of inspiratory effort. Both premature and delayed cycling are sources of patient–ventilator dyssynchrony. The delay between end of effort and offset of ventilator pressurization and the time during which effort and assistance are in phase [28, 32] is probably the easiest way to describe I:E cycling performance. A standard reference point to define the end of mechanical insufflation would also be valuable; because inspiratory flow may persist beyond the end of ventilator insufflation, the point where P aw falls below the preset inspiratory pressure is preferable to the point of zero flow.

Inspiratory assistance and rate of pressurization in pressure-targeted modes

The capacity of a ventilator-interface unit to unload a patient’s respiratory muscles depends on the amount of assistance provided throughout inspiration and on the speed of achievement of the preset inspiratory pressure, i.e., rate of pressurization.

The initial rate of pressurization has been assessed by computing flow acceleration from zero to 85% of peak inspiratory flow [2], or the time required for P aw to rise from baseline up to 90% of the end-inspiratory value (T 90%) [20, 31], or the rate of pressurization during the first 150 ms [14]. Some authors propose the ventilator delivered peak flow as an index of the speed of pressurization [19, 21, 22].

To evaluate and compare the performance of different machines during PSV, Lofaso et al. [2] calculate the work performed by the ventilator using the dynamic pressure–volume loop; they express it as a percentage of the ideal (maximal) mechanical work, represented by a perfectly squared P aw–time curve profile, as would occur with an immediate achievement of the preset inspiratory pressure. Subsequent studies have assessed ventilator performance by computing the integral of P aw over time of insufflations, using PEEP as baseline, referred to as P aw–time product (PTPaw). This area is expressed either as an absolute value (cmH2O s) or the ratio between the actual (measured) area and an ideal area, i.e., a perfectly squared P aw–time curve profile [9, 10, 13, 19, 22, 28] (Fig. 3). Bunburaphong et al. [19] and Chatmongkolchart et al. [19, 22] express PTPaw both as an absolute value and percentage of the ideal area. Considering the initial pressurization of the ventilator as a crucial PSV feature, Richard et al. [8] compute PTPaw over the first 300 ms (PTP300) and 500 ms (PTP500), in addition to PTPaw of the whole time of insufflation. Jaber et al. [12] also determine PTP300 and PTP500, whereas Thille et al. [15] just PTP300. Borel et al. [30] calculate PTP500 starting from the point where P aw returns to baseline. Other authors propose PTP300 and PTP500 as a percentage of the ideal PTPaw [9, 10, 13, 16].

The assessment of the performance of NIV interfaces is based on analogous criteria. Chiumello et al. [18] compute PTPaw from onset to end of inspiratory flow, as an index of the pressurization achieved by the interface-ventilator unit, and the time lag from onset of inspiratory flow to achievement of the preset pressure support level (T ps), as an index of the pressurization delay [18]. Moerer et al. [11] calculate PTPaw from onset to end of mechanical insufflation, using PEEP as baseline. Costa et al. [28, 32] propose the time of pressurization (Timepress) as the time necessary to achieve the preset inspiratory pressure during PSV, and express PTP300 as an absolute value and PTP500 as a percentage of the ideal PTP over the first 500 ms [28].

The inspiratory assistance provided by the ventilator is definitely affected by the rate of pressurization, a rapid rise in P aw resulting in more efficient assistance [37, 38], as depicted in Fig. 4. PTPaw is an index of performance, but can be remarkably affected by the experimental setup and the methodology of assessment. The evaluation of the pressurization performance of the same ventilators resulted in opposite results in two studies using different settings of the driving ventilator and diverse criteria for PTPaw assessment [13, 15]. When PTPaw is calculated over the whole time of insufflation, the impact of a slower pressurization rate on the overall area can be offset by a later termination of the mechanical insufflation [8], which may explain the quite modest differences in PTPaw observed when comparing the endotracheal tube with the helmet [11]. Limiting the time of PTPaw calculation to the initial 300 or 500 ms of the inspiratory phase corrects for this bias and provides valuable information on the interface-ventilator unit’s ability to rapidly achieve the preset inspiratory pressure [8, 12]. Expressing PTPaw as a percentage of an ideal pressurization rather than as an absolute value allows comparison of data acquired with different inspiratory pressure settings, but does not eliminates the confounding effect secondary to the application of diverse simulated inspiratory efforts [9, 12, 18].

In all studies but one [30], PTPaw is calculated from the time point corresponding to the onset of simulated inspiratory effort, then including the triggering phase, as depicted in Fig. 3 in which PTPaw is calculated by subtracting the area a (i.e., PTPt) from the sum of areas b, c, and d, i.e., PTPaw = [(b + c + d) − a]; likewise, PTP300 = b − a and PTP500 = [(b + c) − a]. Even though consistent among studies, this approach is not necessarily correct. In fact, as shown in Fig. 4, with this computational approach PTPaw, PTP300, and PTP500 not only depend on the capacity of pressurization, but are also affected by the trigger performance, which has already been separately evaluated [13]. On the one hand, because these indices have been proposed to assess the ability of the ventilator to meet patient inspiratory demand under different dynamic conditions, it could be argued that considering the initial triggering phase in these parameters makes sense from a physiological point of view. On the other hand, determining PTPaw, PTP300, and PTP500 starting from the time point corresponding to the onset of machine insufflation (i.e., at the P aw nadir during the triggering phase) would definitely provide more specific information on the rate of pressurization, which remains not entirely explored when using the approach adopted so far in most studies. PTPaw, PTP300, and PTP500 calculated from the time point corresponding to the onset of simulated inspiratory effort may anyway represent a valuable index of overall performance.

Conclusions

Several bench studies have been published to provide the clinicians valuable information about the performance of ventilators and interfaces for NIV. While these studies could (and to some extent should) influence the choice in acquisition of ventilators and interfaces for NIV, to date no available data demonstrate an impact of bench tests of NIV equipment both in the acute and chronic clinical setting. Several critical issues make the comparison between devices problematic, as summarized in Table 1. Consistent experimental settings, uniform terminology, and standard measurement criteria would help to enhance bench assessment of characteristics and comparison of performance of ventilator and interfaces for NIV. A task force of experts to achieve consensus on these issues would be helpful.

Table 1 Major problems encountered during bench studies evaluating devices for non-invasive ventilation and of the possible solutions