Beyond health system contact: measuring and validating quality of childbirth care indicators in primary level facilities of northern Ethiopia



Measurement of quality of health care has been largely overlooked and continues to be a major health system bottleneck in monitoring performance and quality to evaluate progress against defined targets for better decision making. Hence, metrics of maternity care are needed to advance from health service contact alone to content of care. We assessed the accuracy of indicators that describe the quality of basic care for childbirth functions both at the individual level as well as at the population level in Northern Ethiopia.


A validation study was conducted by comparing women’s self-reported coverage of maternal and newborn health interventions during intra-partum and immediate postpartum care received in primary level care facilities of Northern Ethiopia against a gold standard of direct observation by a trained third party (n = 478). Sensitivity, specificity and individual-level reporting accuracy via the area under the receiver operating curve (AUC) and inflation factor (IF) to estimate population-level accuracy for each indicator was applied for validity analysis.


455(97.5%) of women completed the survey describing health interventions. Thirty-two (43.2%) of the 93-basic quality child birth care indicators that were assessed could be accurately measure at the facility and population level (AUC > 0.60 and 0.75 < IF< 1.25). Few of the valid indicators were: whether women and their companion were greeted respectfully, whether an HIV test was offered, and whether severe bleeding (hemorrhage) was experienced by the woman. An additional 21(28.4%) indicators accurately measure at the facility or individual level, but the indicators under or over estimate at population level. Thirteen other indicators could accurately measure at population level. Eight (8.6%) indicators didn’t meet either of the validity criteria.


Women were able to accurately report on several indicators of quality for basic child birth care. For those few indicators that required a technical understanding tended to have higher don’t know response from the women. Therefore, valid indicators should be included as a potential measurement of quality for the childbirth care process to ensure that essential interventions are delivered.



Plain text summary
What is already known on this topic? As facility deliveries increase and the global community pays greater attention to service quality, observation as a clinical quality assessment tool may be a valuable measure. However, the existing observation-based measures are lengthy, introduce the possibility for measurement error and are difficult to administer.
What does this study add? Our finding revealed that women were able to accurately report on several (n = 32) of the 93 basic quality child birth care indicators across phases of labor and delivery. A few of them were: whether women and their companion were greeted respectfully, whether an HIV test was offered, and whether severe bleeding (hemorrhage) was experienced by the woman. An additional 21 indicators met the facility-level accuracy, 13 met the population level accuracy and 8 indicators did not meet either of the criteria.
Indicators that met both validity criteria (AUC and IF) could be appropriate for the measurement of quality for care at the facility and population level.
Indicators that do not meet criteria for the AUC but do meet the IF criterion may be suitable to measure intervention coverage at the population level as false positive and negative reporting balance each other at the aggregate level.
Moreover, indicators that do met AUC criteria, but not IF criteria may be useful for facility level measurement or useful for individual level classification but may be over reported at the population level. However, indicators that met neither of the criteria are invalid for measurement purposes, but should be used in accordance with the rationale for their distinctive use.
What are the implications for practice and further research?
These accurate indicators are very important for other researchers as a potential measurement of quality of routine child birth care signal functions in the Ethiopian setting. Despite this, there have been few efforts to develop standardized metrics of quality for both mothers and newborns throughout the continuum of maternity.
Therefore, a further nationwide survey should be conducted to facilitate movement toward a collection of fewer but better metrics of quality (content) of care indicators across the phases of labor and delivery services.

Background
The time encompassing labor, delivery and the first 24 h after birth is the highest risk period for maternal and neonatal health. Adhering to the standards of maternity care services is essential to ensure quality services are delivered [1,2]. In low and middle-income country settings, where the vast majority of maternal and newborn deaths occur, data on the coverage estimates of routine facility childbirth interventions often rely on the contact of skilled birth attendant for monitoring purposes [3,4]. But the presence of a skilled birth attendant does not guarantee the actual content of care [5,6]. Research findings indicate that measuring interventions that a woman actually receives is more informative than measuring contact with care providers, provided the women can accurately report this information [4].
Measurement of the quality of processes for basic care of childbirth interventions is complex and requires attention to empirical validation of the indicators to strengthen the quality of measurements [7]. Studies have documented the poor quality and limited sensitivity of obstetric facility records and databases for assessing the performance of care processes in both low and high-resource settings [8].
Absence of health monitoring systems that can provide accurate data on population coverage or demographic health survey programs collect inadequate information of content of care received during facility childbirth. Additionally, several researchers have highlighted discrepancies between contact with care providers and receiving quality care [9,10]. This research notes that a number of composite measures or checklists have been developed through expert opinion, but few have been validated. They suggest that empirical validation is important in strengthening quality measures [11]. While direct observations of clinical care is considered the gold standard for measuring the quality of care, the existing observation-based measurement of childbirth process of care are lengthy, at times including hundreds of indicators [12]. This complexity introduces the possibility of measurement error, difficulty of administration, costliness, and lack of feasibility for routine use in most resource poor settings [13,14]. In addition, we were unable to identify a publication describing a tangible study on validity of quality indicators of basic childbirth care interventions in Ethiopian settings.
Therefore, it is important to identify alternate indicators that describe the actual basic content of child birth care that can be reported accurately to be included in facility or routine data collection programs and population based-surveys. The aim of this study is to determine which aspects of basic child birth process of care indicators are able to differentiate between the two sets of measures (women's self-reports compared against thirdparty observations), to explore whether women can appropriately report on these indicators and provide suggestions for modifications to data collection procedures that could advance the measurement of maternity care. This investigation also provides an opportunity to apply the results of this study for tracking progress and to enhance the monitoring of effective coverage of essential and basic interventions for both mothers and newborns.

Study setting and population
A facility-based cross-sectional validation study was conducted among primary health care facilities of South Eastern zone of Tigray, Northern Ethiopia. The zone has four rural districts, namely Degua Tembien, Enderta, Saharti Samre and Hintalo Wajrat. In the districts, there are a total of 4 primary hospitals and 27 health centers [15]. At the time of the study, nearly a quarter of the populations (23.4%) were of reproductive age  years. According to the 2016 Ethiopian Demographic Health Survey, nationally 26% and in the Tigray region 57% of mothers delivered in a health facility [16]. Reproductive-age group mothers who received labor and delivery care in primary health care facilities were the source population.

Indicator selection
To identify indicators to be validated, reviews and scans of published and grey literature focused on indicators of the content of care received during facility child birth; this review was conducted between April and June 2018. Indicators were identified by a key term search of maternal health, safe motherhood, quality of care, indicator, valid, skilled birth attendant, obstetric, and intra-partum care. After collecting a list of 112 indicators, a group of reproductive health experts identified 93 key dimensions or set of indicators for validity testing [Additional file 1].
The validation indicators were selected based on the frequency of use and/or potential to assess the essential elements of mothers and newborns. The final indicators were placed into one of four categories: (1) respectful maternity care; (2) content of care; (3) non-indicated obstetrics care; and (4) maternal-neonatal outcome sections.

Sample size
Buderer's formula is used for sample size calculation in diagnostic accuracy studies at the required absolute precision level for sensitivity and specificity [17,18]. Considering the proportion of mothers who received essential care practices at childbirth was 35.7% from a prior study in India [19], type 1 error was set at α = 0.05, considering a sensitivity level of 80%, a precision of 6%; specificity of 60% and a 5% non-response rate were used. As a result of all these conditions, a target sample size of 478 laboring women to be observed was calculated.

Sampling technique and participant recruitment
The South-Eastern zone of Tigray region was selected purposely. All health centers with their respective catchment primary hospitals were included. The total sample size of the delivering women was distributed over each of the health facilities proportional to their sample size considering the average number of deliveries per facility per month and all the skilled birth attendants consented to participate in the study were enrolled. Finally, a consecutive sampling technique was used in which every laboring mother meeting the criteria of inclusion (normal first stage of labor) is selected until the required sample size was achieved. Each skilled birth attendant was observed 3-5 times.

Data collection procedure
Data collection was conducted between July 15, 2018 and October 5, 2018. Twelve pairs of midwives, health officers and nurses worked as data collection teams constituting one observer and one interviewer for each facility. Data collectors had previous research experience and were trained for four days. The team worked in two shifts (day and night). Providers were observed by a third party consisting of trained data collectors using a structured checklist. An indicator matrix or structured checklist [Additional file 2] was developed from the Ethiopian basic emergency obstetrics guidelines [20] and published literature [4,5,9]. The interview questionnaires were translated into the appropriate local language "Tigrigna" and underwent minor modifications to improve local understanding and clarity for participants. Moreover, for few of the technical questions special emphasis was given in how the mothers could be easily understood by their local language expressions. The method of observation was nonintrusive, where the health care providers (HCPs) did what they normally do without being interrupted or disturbed by the observer. Observations were used as the reference standard as they reflected all facets of care including all interactions between the women and the providers. Data were again collected using exit interviews with an interviewer-based questionnaire from the delivered mother at the time of facility discharge. Interviewers and observers were not the same individuals and were external to the study facilities to reduce the possible social desirability bias.

Statistical analysis
For each participant, a unique identification code for the client exit interview and observation record was matched. Questions of basic intervention indices were coded one if the response was performed "Yes", zero for "No" responses and all other responses were coded as "don't know (DK)". Unmatched cases, missing and "DK" responses by the woman, as well as indicators that had less than five counts per cell were excluded from further validation analysis because of not fulfilling the assumptions of the validity analysis criteria. We assessed two aspects of indicator validity (accuracy of the women's reports against the observer). The first one is accuracy at the individual or facility level, calculated as the area under the curve (AUC) which is a plot of the sensitivity (i.e., true positive rate) versus 1-specificity (i.e., true negative rate).
AUC scores range 0 to 1, with an AUC of 0.5 representing a random guess and AUC of 1 representing perfect diagnostic accuracy. For the purposes of this study we used an AUC of 0.6 or greater as a priori benchmark of validity testing [9].
The second measure of validity is to estimate the prevalence of the indicator that would be obtained from a population-based survey or population level accuracy, calculated based on the sensitivity and specificity of each indicator to its true prevalence (i.e., observer report) using the following equation: Population based prevalence = true prevalence x (sensitivity + specificity − 1) + (1-specificity). Inflation factor (IF), or the ratio of the survey-based prevalence to the true prevalence, is estimated to assess the degree to which each indicator would be over or under-estimated at the population level [14]. A priori validation criteria for the IF was set at 0.75 < IF< 1.25. In order to summarize indicator validity based on reports from women's giving birth, we considered meeting both the individual-level (AUC) and population-level (IF) criteria (0.60 < AUC and 0.75 < IF < 1.25) [12,21,22]. All analysis was performed using Stata Version 14 software [23].

Sample descriptive characteristics
Overall, 478 women admitted for labor and delivery were consented to participate. Of those who consented to participate, a total of 467 women were enrolled in the study. Among those enrolled, 2.5% (n = 12) were lost to follow-up or discontinued their participation. Finally, a total of 455 observer reports and client exit interviews were accurately matched and analyzed.

Socio-demographic characteristics of participants
The mean age of women was 28 years (SD = 6.38), and ranged between 17 and 45 years. Over a third of women (41%) reported no formal education. Around 95.4% (n = 434) of women received antenatal care provided by skilled health personnel for reasons related to pregnancy at least once during their current pregnancy [ Table 1].

Validation results for recognized indicators of quality childbirth care signal functions
Based on a woman's report about her experience of care during childbirth, of the recognized quality care indicators (n = 93) for validity analysis, 14 had a greater than 5% DK response by women and 5 indicators did not fulfill adequate cell size (i.e., at least 5 counts per cell). The latter five indicators included the use of enema, pubic shaving, slapping the newborn, something other than breast milk given to the baby in the first hour of birth and recording the birth weight of the newborn.
Finally, 32 indicators met both validation criteria, 21 indicators met individual -level, 13 indicators met population-level and 8 indicators didn't meet either of the criteria. About women's responses: a high percentage of women who responded "DK" were for the indicator of Apgar score (43.5%). While minimal "DK" responses were reported for the indicator that the provider palpate the woman's uterus 15 min following delivery of the placenta (5.93%). All the indicators of DK women's response were lying in the content of child birth quality indicator category [ Table 2].
The subsequent findings report validated quality of care during child birth indicators in accordance to: (1)  Three respectful maternity care indicators had accuracy at the individual or facility level. These were (from highest AUC to lowest): the provider introduced him or herself to the woman (AUC = 0.68, 95% CI: 0.63-0.72), the provider encouraged the woman to assume different positions during labor (AUC = 0.63, 95% CI: 0.58-0.68) and at least once, the provider explained what will happen during labor (AUC = 0.61, 95% CI: 0.56-0.65). The prevalence for each indicator as reported by the women and the observer were incongruent for these indicators.
For example, 54.29% of the women surveyed reported that the provider introduced one's own name and role, while only 23.96% of the observers reported this with low specificity (Sp = 54.34, 95% CI: 48.92-59.67).
Four respectful maternity care indicators showed population-level accuracy: The provider responded professionally (AUC = 0.58 ± 0.05, IF: 1.08), the provider did not physically abuse the patient (AUC = 0.61 ± 0.06, IF: 1.08), the provider did not abandon patient without care (AUC = 0.52 ± 0.04, IF: 0.75), and the provider maintained good communication/collaboration (AUC = 0.59 ± 0.05, IF: 1.03). Those population level accurate indicators had high false positive rate with high sensitivity ranges from 85.86 to 97.26% and low specificity that ranges (18.82-40.96%). Four respectful maternity care indicators did not meet either of the validity criteria [ Table 3]: women provided oral consent before examination (AUC = 0.58 ± 0.05, IF: 1.33), were allowed to have a companion during delivery (AUC = 0.50 ± 0.04, IF: 1.26), providers did not verbally abuse their patient (AUC = 0.51 ± 0.04, IF: 1.29) and providers treated clients equally without discrimination (AUC = 0.57 ± 0.05, IF: 1.31). These indicators did not meet either of the validity criteria.

Content of care indicators
Of the 39 routine contents of child birth care signal function indicators, eighteen of them met both individual and population level acceptability criteria. For example: 40% of the women reported receiving the HIV test, which closely approximated the true prevalence of 44%. The validity analysis showed that women were able to accurately report whether they received an HIV test or not (SN: 94%, SP: 81%). Furthermore, the indicator of breastfeeding initiated within first hour of birth did meet both validity criteria. This indicator had high sensitivity (88%) and low specificity (17%), suggesting that while most women who initiate breast feeding in the first hour correctly reported doing so, nearly one out of eight women who did not breastfeed in the first hour falsely reported doing so (83%).   Most of these valid indicators had low specificity, indicating there is high false positive rate. For example, only 6.0% of the women correctly reported that the health care provider didn't wear sterile gloves during vaginal examination. However, a composite indicator of 5 essential elements of newborn care had low sensitivity (38%) and high specificity (76%) indicates a high false negative rate, which shows, 62% of woman who did receive all the five elements of newborn care did not report receiving those interventions.
Of the content of care indicators which did not meet either of the validity criteria were: the scale was calibrated and the baby was weighed (AUC = 0.48 ± 0.05, IF: 0.72) and a women's vulva was cleansed (AUC = 0.59 ± 0.04, IF: 1.99) which showed the observed prevalence was nearly double at the population level compared to the facility where data were collected for this study [ Table 4].  Table 5].

Maternal and newborn complications Maternal complications
Participants were questioned about whether they experienced any of the following conditions either during or immediately following delivery: (1) bleeding, (2) preeclampsia/eclampsia (3) laceration (4) another type of complication (asked to specify), or (5) no complications.
Indicators of women's report of experiencing any type of complication, hemorrhage, laceration and avoiding delays in received care met both validity criteria. About reporting the prevalence of maternal complications, nearly 19% of women reported experiencing some type of complication, which exceeded the observed prevalence (15%). Self-reports of experiencing any complication had a sensitivity of 45%, indicating that around half of women who had experienced a complication did report it. The indicator also had high specificity (85%), reflecting a low rate of false positive reports by women. The indicator of avoiding delays in receiving care had a high specificity (91.82, 95% CI: 88.64-94.33) but low sensitivity (32.81, 95% CI: 21.59-45.69). In addition, the most commonly reported indicators by mothers were experiencing excessive hemorrhage (9.67%), followed by laceration (3.96%).
Three indicators met the individual-level accuracy: Preeclampsia/eclampsia, neonatal complication and new born death within the facility. The indicator of preeclampsia/eclampsia faced around birth had low sensitivity (38%) and high specificity (95%) and was accurately classified at individual level (AUC = 0.62, 95% CI: 0.53-0.70). This shows there is a high false negativity rate and an overestimation at the population level (IF = 1.5).

Newborn outcomes
Mothers were asked whether their newborn babies were faced with any of the following complications during birth: (1) birth asphyxia, (2) still birth (3) infection (4) newborn death within the health facility (5) any other type of complication, or (6) no complications.
Only the birth asphyxia indicator of the newborn complication met both validity criteria (AUC = 0.76, 95% CI: 0.68-0.84, IF: 1.19). Women's reports on birth asphyxia had a sensitivity of 64%, indicating that over one-third of women who had asphyxiated newborns did not report it. However, the indicator had high specificity (95%), reflecting low false positive reports.
The indicator of any neonatal complication only met study validity criteria at the individual level (AUC = 0.71, 95% CI (0.68-0.84), IF: 1.39), but suggests that the indicator was overestimated by 1.39 at the population level. Likewise, the indicator facility newborn death met individual-level . This indicates that the indicator was underestimated by 0.47 at the population level. Implies the indicator had low sensitivity and high specificity indicating not all women whom their newborns death at facility correctly reported it. This might be due to mothers unable to differentiate newborn death and still birth. Only the still birth indicator did not meet either of the validation criteria (AUC = 0.57 ± 0.07, IF = 1.52). Regarding the perinatal death (still birth and neonatal death) indicator, mothers could not differentially report whether the death was a still birth or early newborn death.
Lastly this study revealed that, 17% of women reported their newborns suffered at least one type of complication, exceeding the observed prevalence (12%) [ Table 6].

Discussion
This study tested the validity of key indicators that measure the quality of care received at the time of the intra-partum and immediate postpartum period which is needed to move beyond measures of nominal facility utilization for delivery to measures of effective coverage of delivery care that are either currently in use or will be incorporated into the household survey. Measures of effective coverage weight utilization estimates by the quality of the services used [24]. Several (n = 32) of the quality indicators across phases of labor and delivery met both validity criteria (accuracy at AUC and IF). Furthermore, an additional 21 indicators met the individual-level criteria, 13 met the population level criteria and 8 indicators did not meet any of the criteria. Indicators that did not meet both criteria are not necessarily or invalid for all measurement purposes, but should be used in accordance with the rationale for their use [9,25]. Indicators that met both validity criteria could be appropriate for the measurement of quality of care at the facility and population level.
Indicators that do not meet criteria for the AUC but do meet the IF criterion may be suitable to measure intervention coverage at the population-level as false positive and negative reporting balance each other at the aggregate level.
Indicators that do met AUC criteria, but not IF criteria may be useful for facility-level measurement or individual level classification but may be over reported at population level. For example, in our study the composite indicator of counseling on care provision of immediate postpartum care, taking a urine sample for a protein test indicator was not correctly reported by women at the population level but, accurate measurement occurred at the individual level because of some of the indicators required understanding of technical terms which are difficult to distinguish by mothers.
Our study shows varied results in specific indicators when compared with other published literature findings. For example the indicator of women being allowed to have a companion of her choice in labor found that most women accurately reported the presence of a companion (i.e., high sensitivity). Our result shows a low specificity for this indicator, which may reflect "facility reporting bias" among women. This finding is not consistent with a study done in Mozambique and Kenya [14,21], that reported high specificity for the indicator of women being allowed to have a companion of choice during labor. This discrepancy could be attributable to  the cultural contexts of mothers, literacy level and awareness on the importance of having a companion of choice during labor. Besides, Mothers may have under reported negative experiences at a facility due to concerns about providers abandoning proper care for their subsequent visit in retaliation for these comments." We also found low sensitivity and high specificity for reported excessive bleeding (hemorrhage); this finding corresponds with levels found among women delivering in Mexico, Indonesia, Benin and the Philippines respectively [9,[26][27][28]. However, these results differ from women's reporting in Ghana [29]. Taken together, the findings suggest that women's understanding and recall of the presence of a companion of choice and obstetric complications experienced may vary by clinical and cultural context, and settings. The study results indicate that there were challenges of measuring low prevalence indicators accurately. Given that the calculation of IF depends upon the indicator's observed prevalence, even a small number of false positive responses can result in overestimation as measured by the IF. A few of indicators met the IF test only; individual-level misclassification does not inherently signify that measurement at the population level will be inaccurate [21,30,31]. Evidence shows that knowledge of whether an indicator is likely to be overestimated can also have significant programmatic implications. For example, where the complication of preeclampsia/eclampsia is over-reported, identifying causes of maternal death at population level may not be as great as expected. When possible, we recommend that users also triangulate self-reported data on quality of care with facility and other data sources. Other core indicators like calibrating the scale and weighing the baby did not meet any of the validity criteria. This suggests that women may not be able to report accurately when and where the measuring scale was placed and calibrated. This study has some limitations. This validation results don't include women indicated for a cesarean section delivery and mothers whom were referred to higher facility with surgical capacity. Furthermore, this tool is used at discharge (soon after labor) and considers population level indicators. It may have some deficiencies for household surveys that take place long after delivery.

Conclusion
Women were able to accurately report on several aspects of quality of care indicators received across the phases of child birth and immediately after birth. A few technical indicators tended to have higher don't know responses.
Although high specificity and sensitivity are preferred for all indicators, knowing the estimated survey-based prevalence is helpful, particularly for indicators of very low prevalence which are likely to be overestimated without near perfect specificity. Likewise, in some cases, low sensitivity and specificity cancel out at the population level and may generate acceptable estimates for coverage monitoring purposes, even if they are not appropriate for analysis at the individual or facility level. Therefore, the valid indicators should be included as potential measurements of quality process of labor and delivery to ensure that the essential content of care interventions is delivered.
Additional work on fewer, better metrics of quality measurement indicators of mothers and newborns through national survey efforts and tracking the basic lifesaving interventions received at the time of birth are reasonable to design a range of context-based quality improvement strategies. Lastly, the valid indicators can be tracked and reported through the health management information system or district health information system two (DHIS2) data for decision making at all levels.