To evaluate the accuracy of HSG compared to hysteroscopy and or laparoscopy and compare intra and interobserver variability.
200 infertile females underwent hysterosalpingography, hysteroscopy and/or laparoscopy as part of an infertility work up. HSG examinations were retrospectively reviewed by three radiologists, we compared inter-observer variability, differences between the two results of reading the same examination after three months were compared to calculate intra-observer variability.
Final diagnosis was compared to hysteroscopy and/or laparoscopy. The overall sensitivity, specificity, PPV, NPV and accuracy of each HSG diagnosis was assessed.
Intra-observer reliability was variable: observer 1 (k = 0.21; observer 2 (k = 0.57); observer 3 (k = 0.65). Highest agreement was seen in the detection of a normal uterus, normal tubes and uterine filling defect, lowest agreement seen in the detection of uterine and pelvic adhesions.
First round results showed moderate agreement between the three pairs of radiologists (k = 0.53-0.42), second round results showed the substantial agreement of observer 1 (k = 0.62), moderate agreement was seen between radiologist 2 and 3 (k = 0.44).
With consensus diagnosis of all readers combined, HSG overall accuracy in tubal pathology and uterine cavitary lesions diagnosis was 93%, and 85%, respectively. Lowest accuracy was seen in uterine adhesions 71%.
HSG is more accurate in tubal evaluation than the uterine cavity assessment. HSG interpretation is somewhat subjective, although experience and training may improve reporting skills and interpretation results, however, considerable observer variability exists. The gynecologist should carefully interpret HSG results and provide future management based on comprehensive clinical and radiological data.