Development and initial validation of a fertility experiences questionnaire

Background Many women throughout the world have history of subfertility (resolved or unresolved), but much remains unknown about services and treatments chosen. Methods We developed a mixed-mode fertility experiences questionnaire (FEQ) in 2009 through literature review and iterative pilot work to optimize question format and mode of administration. The focus of the FEQ is to collect data retrospectively on time at risk for pregnancy, fertility treatments received and declined, pregnancy, time to pregnancy and pregnancy outcomes. We conducted a validation of key elements of the FEQ with comparison to medical records in 2009 and 2010. The validation sample was selected from women initially seen at a specialized fertility treatment center in Utah in 2004. Results The FEQ was optimized with two components: 1) written (paper or web-based), self-administered, followed by 2) telephone- administered questions. In 63 patients analyzed, high levels of correlation were identified between patient self-report and medical records for the use of intrauterine insemination and assisted reproductive technology, pregnancy and live birth histories, time at risk for pregnancy and time to pregnancy. There was low correlation between medical records and self-report for the use of oral ovulation drugs and injectable ovulation drugs. Compared to the medical record, the FEQ was over 90 % sensitive for all elements, except injectable ovulation drugs (70 % sensitivity). Conclusions The FEQ accurately captured elements of fertility treatment history at 5–6 years after the first visit to a specialty clinic. Electronic supplementary material The online version of this article (doi:10.1186/s12978-015-0054-3) contains supplementary material, which is available to authorized users.


Introduction
Subfertility (also called infertility) is a prevalent condition throughout the world [1,2]. Based on the conventional definition of no pregnancy within a year of intercourse without contraception, the prevalence of subfertility is estimated at 15 % or more among couples currently trying to conceive in developed countries [2][3][4]. Many women or couples experiencing subfertility never seek services, and among those who do, the level of services received varies widely [5][6][7]. In addition to accessibility of medical services, demographic and social factors are strongly correlated with the types and volume of fertility treatments received [8,9].
Questionnaires can provide valuable information on many aspects of subfertility, including its epidemiology, psychological aspects, and women's and couple's decisions about treatment. Such information is essential to inform efforts to improve medical treatment and to guide policy regarding the allocation of resources to diminish the societal and economic burden of subfertility [7,10]. However, questionnaire research to date in the field has been largely ad hoc, and many questionnaires have not been validated.
Our goal was to generate a questionnaire instrument that can be used retrospectively to ascertain fertility treatments chosen by women, reasons for choosing or declining different treatments, factors that may have influenced choices of treatments, timing of treatments, and a detailed history of attempts to conceive (by which we mean time at risk for pregnancy, as defined further below). We conducted a validation comparing important components of the questionnaire with data from medical records in a clinical sample.

Design of the questionnaire
We initially conducted a literature review to identify questionnaires that possibly included domains of interest for our research [7,[11][12][13][14][15][16][17][18][19][20]. We also contacted authors to obtain copies of their instruments, where possible. We used items verbatim from some questionnaires [12,13], and adapted items from others (as referenced in Table 1). Based on this review and consultation with experts in the field, we constructed a questionnaire with the domains of interest for our research, called here the fertility experiences questionnaire (FEQ).

Pregnancies and time at risk for pregnancy
We were particularly interested in assessing time at risk for pregnancy, pregnancies, and pregnancy outcomes retrospectively. The "pregnancies and attempts to conceive" component of the FEQ captures information about all time in a woman's life when she was at risk of pregnancy, whether or not she was "trying" to conceive, and whether or not the "attempt" ended in pregnancy. Each time at risk for pregnancy is called an "attempt" to conceive in the FEQ. This written questionnaire contains a definition as well as an illustrative example for "attempts to conceive" to enhance respondents' understanding of the concept of "attempts to conceive." (See Additional file 1 for the full FEQ, which includes the exact definition and examples used in the online portion of the "Attempts to Conceive" section.) Several initial questions are asked in the written questionnaire, with a follow-up telephone interview for further verification and clarification. The goal is to capture as accurately as possible the time a woman was actually at risk of pregnancy. Unlike other time to pregnancy questions that ask a woman to give a general number of months it took her to become pregnant, the "attempts to conceive" section specifically excludes time that a woman was not at risk for pregnancy (for example, due to spousal separation, or desire not to have a baby in a certain month), even though she may have intended to achieve a pregnancy [21]. It also explicitly includes time that a woman was at risk of pregnancy without intending pregnancy [22]. Dates (month and year) are established for specific definitions of beginning an "attempt to conceive" and when the pregnancy occurred, similar to an approach previously used by European investigators in an interview questionnaire, but with some additional details [23]. For "attempts" that ended in pregnancy, there were additional questions about the outcome of the pregnancy.

Other components of the FEQ
Other components of the FEQ addressed fertility evaluation and treatment, women's general health, menstrual and sexual history, and psychological and social factors. All components of the FEQ are outlined in Table 1. The complete FEQ is in the Additional file 1.

Pilot testing
We pilot tested the FEQ in four sequential phases: entirely by self-administration (written on paper, 10 women), entirely by face-to-face interview (5 women), entirely by telephone interview (3 women), and by mixed-mode of administration (written online, followed by interview, 30 women). All participants in the pilot testing were current patients of a specialty fertility clinic, the Utah Center for Reproductive Medicine (UCRM). The 18 women participating in the first three phases were a convenience sample from women being seen in the clinic during the time of the development work. The 30 women in the fourth phase were a random sample and were also included as the first group of women in the validation analysis, described in the next section below.
After each administration of the FEQ, a semi-structured debriefing interview was conducted (in person or by telephone) that explored which questions were perceived as difficult, confusing, or objectionable. Revisions of wording of some items and responses were undertaken after the first three phases. We also reviewed the time required to complete the FEQ and the consistency of responses in these phases. Oral interviews obtained information on time to conceive (as described above) that was more internally consistent than complete self-administration on paper. We also found that we had more complete and internally consistent information regarding pregnancy outcomes, fertility treatments, and self-help measures when these were assessed first in writing, with a follow-up telephone interview. Finally, we found that mixed-mode administration was more efficient than complete oral interviews and yielded internally consistent information on parts of the questionnaire that were not temporally tied to attempts to conceive.

Validation
This validation includes two groups of women. (See Fig. 1) The original group was selected via random sample of women over the age of 18 who had an initial consultation for subfertility generally, or in vitro fertilization (IVF) specifically at the UCRM in the year 2004. There were no other exclusion or inclusion criteria for this group. In 2009, we attempted to contact each woman by telephone three times at different times on a day, Ascertained and linked to attempts to conceive, and whether linked to conception Timed intercourse by counting days, basal body temperature, urine ovulation test kits, cervical mucus or fluid; took herbs, fertility vitamins, or supplements; lost weight; adhered to fertility diets; took a daily drug to enhance fertility; took a drug for ovulation; took hormones like progesterone Adoption experiences X Ever applied for adoption, any adopted children Stress and social situation [12] X Likert scale questions about impact of fertility problems and/or treatment on life, relationships with partner, family, friends; level of support from family, partner, friends; negative reactions from family, partner, friends.
Experience of past fertility treatment [12] X Likert scale questions about perceptions of past treatment: had enough time, shared decision making, feeling listened to, receiving explanations, addressing emotional issues Demographic information X Marital status and date, education, race, ethnicity, country of birth, country of parents' birth, languages spoken, religious preference, occupation, income, whether have written records of fertility experiences, best times to contact by phone Items in these sections of the questionnaire were adapted from questions used in research conducted by Mary Croughan, PhD, University of California, San Francisco evening, and weekend to ask if she would be willing to receive information about the study. If she agreed, we obtained an e-mail address where we sent an e-mail describing this study. Included in the email was an attachment of a consent document as well as a link to the questionnaire and instructions to decline participation if she chose to do so. Women consented by clicking on the link to begin answering the questionnaire. Women who completed the online questionnaire were contacted by telephone for the follow-up interview at a time of their convenience to conduct the telephone component of the FEQ, the follow up interview typically occurred within 2 weeks of completion of the written survey. There was no compensation for these women. A subsequent or expanded group was selected from the remainder of the UCRM patients who were also first seen for infertility or IVF in 2004; this group participated in a follow-on study that had additional eligibility criteria. These women all received a mailing in 2010 inviting them to contact the study investigators if they were interested in participation. Those who did so were screened for eligibility based on primary subfertility (defined by being at risk for pregnancy for 12 months with the same partner without pregnancy and never having a positive pregnancy test prior to being seen at the clinic) as well as living in Utah at the time of their appointment. These women received a $10 gift card for their participation.
For all participants, medical records were obtained from the UCRM for independent chart review to extract key variables for comparison. Review of medical records was done by two physicians at the UCRM (S.G., M.G.), and three medical students under their supervision. Records from any other clinics that patients may have visited in addition to UCRM were not available to us for analysis.
We used data from billing records to compare participants from both groups described above to all patients seen at the UCRM during 2004 (the source population, n = 923) on the characteristics of age, type of visit, and whether they had a subsequent visit for an early pregnancy ultrasound.
For the initial validation, we chose the following variables to compare between the medical chart review and the FEQ: use of oral ovulation enhancing drugs, use of injectable ovulation drugs, use of intrauterine insemination, use of in-vitro fertilization, time at risk for pregnancy, time to pregnancy, pregnancy, and live birth. These represent the outcomes of greatest interest for this questionnaire.

Statistical analysis
We performed correlation analyses to determine the degree of agreement between the specified elements in the questionnaire and the patients' medical records. For categorical variables (history of different types of treatment and pregnancy history), we calculated sensitivity and specificity, and used Cohen's kappa statistic to rate interobserver agreement [24]. As a sensitivity analysis, we also repeated these analyses for the women who never conceived.
From the FEQ interview, we calculated time at risk for pregnancy starting at the woman's report of the beginning of the first attempt to conceive. The end point for the time period was her first clinic visit to UCRM, which is also the time point that we extracted the time attempting to conceive that was recorded in the medical record. Between the start and end points, any time that was reported as not being at risk for pregnancy in the FEQ was subtracted from the interview-based time at risk for pregnancy, but not from the medical recordbased time at risk for pregnancy. For women who had a pregnancy, we also calculated time to pregnancy. For this calculation, the same starting point was used from the FEQ interview, and the end point was the time of pregnancy, as reported by the woman. For the medical record, the time attempting to conceive that was recorded at the first visit was added to the subsequent time until the beginning of the first subsequent pregnancy identified in the medical record. To compare time at risk for pregnancy and time to pregnancy between the FEQ interview and the medical record, we used the Pearson's linear correlation coefficient.

Enrollment and sample characteristics
For the initial sample, we attempted to contact 160 women, randomly sampled from UCRM patients that were first seen in 2004 see (Fig. 1). Eighty-two (54 %) were not locatable. Of the 69 women (46 %) that we located, 61 (88 %) agreed to consider participating. Of these, 34 (56 %) completed the written questionnaire, and of these, 30 (88 %) completed the follow-up interview. Finally, we were able to locate medical records for 26 (87 %) of the women that completed the follow-up interview.
In the expanded sample, there were 763 additional UCRM patients first seen in 2004, of whom 175 responded that they were interested in participating. Ninety-two (53 %) of these women screened eligible. Of these women, 64 (68 %) completed both components of the FEQ. Of these, we were able to locate 41 (64 %) medical records for review. Subsequently, we determined that four of these women were seen at UCRM prior to 2004 and we removed them from the final analysis. Thus, the total sample included in the validation analysis is 63 women: 26 from the initial sample, plus 37 from the expanded sample.
The characteristics of the women enrolled are shown in Table 2, with comparison to all patients seen for first fertility visits at the UCRM in 2004 (where available from billing data). In general, participants were highly educated and with medium to high household income. About 30 % had never had a pregnancy. Participants were slightly younger than clinic patients, with a mean age of 30.4 years, compared to the clinic population mean of 31.8 years (p = 0.066). Participants were probably more likely to have had a successful pregnancy, as 65.0 % of participants had a subsequent visit to evaluate an early pregnancy by ultrasound, whereas 49.0 % of all clinic patients had such an evaluation (p = 0.014).

Treatment history
The agreement between the FEQ and the medical record for different treatments is shown in Table 3. Compared to the medical record, the sensitivity of the FEQ was uniformly higher than specificity. The agreement was good for intrauterine insemination and assisted reproductive technology (kappa 0.64, 95%CI 0.46-0.83; and 0.74, 95%CI 0.57-0.90, respectively), but lower for use of oral ovulation drugs and injectable ovulation drugs (0.41, 95%CI 0.21-0.61 and 0.21,95%CI 0.0.-0.51, respectively). For women who never conceived (n = 29), the respective kappas were very similar: 0.65 for intrauterine insemination, 0.68 for assisted reproductive technology , 0.45 for oral ovulation drugs, and 0.22 for injective ovulation drugs, respectively.

Pregnancy history
The kappa for the agreement for pregnancy history during the time the woman was a patient at UCRM was 0.65. There was perfect concordance for 50 (79.4 %) participants with respect to the number of pregnancies reported in the interview with the number of pregnancies reported in the medical record. Nine women (14.3 %) reported more pregnancies in the interview than were reported in the medical record, while one (1.6 %) reported one fewer pregnancy in the interview than was reported in the medical record. The kappa for the agreement for live births a woman had during the time she was a patient at UCRM was 0.55. 43 (68.3 %) showed perfect concordance, while 12 (19 %) reported more live births in the interview than in the medical record and one (1.6 %) women reported one fewer live birth in the interview than in the medical record.

Time at risk for pregnancy and to time to pregnancy
About half of the medical records did not contain sufficiently detailed information on time attempting to conceive at the first visit. We were able to compare and calculate time at risk for pregnancy for 35 women, and time to pregnancy for 29 of those women. The mean and median time at risk for pregnancy, as reported in the interviews was 42.1 months and 40 months respectively; in the medical record it was 36.4 and 30, respectively, with a Pearson's correlation coefficient of 0.42 (95 % CI: 0.10-0.66) ( Table 4). For time to pregnancy, the Pearson's correlation coefficient was 0.77 (95 % CI: 0.55-0.88) ( Table 4 and Fig. 2).

Discussion
We designed the FEQ to cover time at risk of pregnancy, fertility treatments received, pregnancy outcomes from those treatments, and a wide range of factors that may influence choices about fertility treatments. In this initial validation study, we found that women's responses to the FEQ were reasonably comparable to medical records for total time at risk for pregnancy, time to pregnancy, pregnancy, live birth, and the use of IVF and artificial insemination. However, there was poor correlation between the FEQ and medical records for the use of oral or injectable ovulation drugs. Uniformly, sensitivity was higher than specificity, meaning that women reported many treatments or events that were not found in the medical record. We believe it is likely that some women may have obtained treatments from physicians outside of the UCRM (since we did not have medical record information from other clinics and in the FEQ did not ask women at which clinic they had received each treatment). Alternatively, it is possible that women misunderstood the questions, or that the use of some treatments was not completely recorded in the medical record. Underreporting in the medical record of treatments actually given at the UCRM is possible for drugs, but we believe it is much less likely for procedures such as artificial insemination or IVF.
Although there was substantial agreement in regards to pregnancies a woman achieved while as a patient in the clinic, 9 (14 %) of the women interviewed reported having more pregnancies at the time they were patients at UCRM than those recorded in their medical record. This is consistent with the fact that after receiving fertility treatment at the UCRM, some women go to their own obstetricians, family physicians, or midwives to confirm a pregnancy and receive prenatal care (since prenatal care is not provided at the UCRM). Unless there is some subsequent contact between the women and the UCRM, these pregnancies are not documented in the UCRM records. During this time period, there was routine follow-up from UCRM to women or couples undergoing IVF, but not necessarily for women undergoing other types of fertility treatment. Probably for the same reasons, many women who had live births reported more live births in their interviews than the number  found in their UCRM medical record. Additional validation studies that allow linkage to medical records of all providers seen, or perhaps complete visit and pharmacy billing records, are needed to corroborate our hypothesis that women reported additional fertility treatment and care for pregnancy outside of the single specialty clinic studied. We believe that the most unique and innovative aspect of the FEQ is that it captures a more accurate and complete time at risk for pregnancy (called "attempts to conceive" in the FEQ) than the single measure of time "trying to conceive" reported in the woman's medical record, as evidenced by the 35 (52 %) women in our sample that reported multiple attempts in the time leading up to the first clinic visit (for which the clinic visit identified only one "time trying" to conceive). A woman might report a longer duration of trying to conceive during her first clinic visit than exists in reality, either because she is anxious to start treatment or because she did not take into account any gaps between her attempts to conceive (such as miscarriages, times of separation, or temporary use of birth control). On the other hand, a woman may also report a shorter duration than her actual time attempting to conceive. This may occur when a woman started having sexual intercourse without contraception thereby putting herself at risk of pregnancy but did not consider herself to be "trying" to achieve pregnancy until a later point of time. For these reasons, we have reported the correlation between the FEQ interview and the medical record for time at risk for pregnancy, but did not consider the medical record as a "gold standard" for this information.
In the development and pilot testing of the FEQ, we found that a combination of written or online questions, followed by a clarifying telephone interview seemed to assure adequate understanding of the concept of "attempt to conceive." We found that the written description of "attempt to conceive" (as found in the full FEQ shown in the Additional file 1) was often not sufficiently understood in the written questionnaire alone. This was the key issue that led us to develop the FEQ in a mixed mode, two-step format, with the interview component focused on clarifying and expanding the information specifically tied to each attempt to conceive. While this makes the FEQ more resource-intensive to administer than an online-only (or written-only) questionnaire, we anticipate that for some research aims and settings, it may be worthwhile to collect this detailed information on time at risk for pregnancy. Other investigators have also used interviews to assess information about time at risk of pregnancy across different pregnancies and attempts [23]. While time at risk for pregnancy assessed retrospectively may be subject to recall bias, others have shown that the reliability of time to pregnancy recall is enhanced when women are queried in-person or via telephone or e-mail [21,25,26]. We believe this supports the value of the two-stage approach to assessing time at risk for pregnancy we have employed in the FEQ. However, for other research aims and settings, the written (online or paper) questionnaire alone may suffice. We plan future research to assess the two-stage, mixed mode assessment in a prospective cohort.
The FEQ also contains many additional elements not validated in this study, although many of them have been used, and some validated, in other studies (see Table 1). Not all of these elements may be pertinent for other research uses where the FEQ may be useful. Women who participated in this study were not completely reflective of all patients seen at the UCRM in 2004. Among the original sample, 30 of 69 (43 %) women contacted ultimately completed both components of FEQ; while in the expanded sample 68 % of those eligible completed both components. The difference in response rate may be related to the $10 compensation provided to the latter group. Those that participated in this validation study were probably more likely to have conceived while under care at UCRM, as we infer from the higher proportion of participants who received visits for early ultrasound assessment of pregnancy ( Table 2). Women who had experienced success were likely more motivated to participate in the study. We cannot exclude the possibility that recall of events may be less complete for women who did not conceive with treatment. There was limited socioeconomic and ethnic diversity in the clinic population and the study sample. It would be useful to focus further validation studies on women from broader cultural and socioeconomic backgrounds, and to include women who did not conceive with treatment.

Conclusions
This study provides initial validation for key elements elements of a mixed mode questionnaire that can be used to improve our understanding of the epidemiology of subfertility in both clinical and population settings, with focus on the natural history of time at risk at pregnancy, and its relationship to fertility treatments and pregnancy outcomes. Our results show high sensitivity for medical treatments, pregnancies, and live births. As suggested by other investigators, we have parsed the time aspect for each attempt to conceive, any fertility treatments received, and any pregnancy each woman may have had in the construction of the questionnaire so that misclassification and bias is minimized to the extent feasible in a retrospective questionnaire [27]. We encourage additional validation from prospective studies. Such information can help inform efforts to better to serve the needs of subfertile couples in the clinical and public health setting.

Additional file
Additional file 1: Fertility experiences woman's questionnaire.