NEJM IVF PGS

Jesus, if we don’t let the acronyms fly around here. I volunteered to look through the article that’s going around on IVF with PGS outcomes from a statistician’s point of view for Bea. My qualifications for doing so are that a) I am a competent statistican and b) I’m not doing IVF with PGS, so I don’t have any personal bias. I am not a medical doctor, but I am a researcher and I have a fairly good grasp of what makes a paper valid or not. If you want technical details about the actual research that was conducted, the protocols and what-not, I’m not the person to ask.

There are several basic things that I look for in any scientific paper, and I’m going to step you through one by one. I’m sure other scientifically-minded folks might have their own interpretations, but this is how I read the article.

Hypothesis being tested

Hypothesis testing is a basic form of statistical inference. That is, you take test results from a small sample of individuals and try to generate conclusions about how the entire population behaves based on the sample results. In this study, the research team was interested in determining whether PGS would increase the ongoing pregnancy rate (defined as a “viable intrauterine pregnancy after 12 weeks gestation) for women with advanced maternal age over the course of three IVF cycles. The baseline rate for the clinical population was 40% — the research team decided that the study results needed to yield a 55% rate to be clinically significant.

In other words, the research team was trying to decide whether it was helpful to use PGS as a standard protocol in all IVF cycles. The study needed to show that including PGS in an IVF cycle yielded a 55% pregnancy rate over a course of three IVF cycles in order to be considered “successful”.

Methodology of the research

Methodology describes the steps taken to produce the research results. The standard rule is that your methodology has to be described in enough detail that another researcher can understand and reproduce your testing method. The NEJM is a peer-reviewed research journal. That means that when a paper is submitted for possible publication, the journal editors pass out the article to several other researchers in the field to see what they think. Those experts review the methodology, and they must approve that the method is sound before a paper will be published. With that in mind, I’m going to accept that their methodology was proper, since I am not an expert in this subject.

However, from a patient’s point of view, there are a few things that stand out to me. First, and Bea and Aurelia went into this in detail, the patient population was defined simply as those with “advanced maternal age”, i.e., over 35. The paper, though not the article, also notes that other conditions were in play: poor semen quality, unexplained infertility, tubal issues, anovulation, endometriosis, cervical issues, and ovarian failure*. There was no control for any of those factors. That means that there might have been different results if the test population had focused on only semen quality, or only tubal issues. Since no controls were in place to account for the variation in baseline diagnoses, no statements can be made about the effectivity of using PGS for any specific condition.

Results

Results are, well, results. Did it work? Did it not work? In this case, the results were measured in ongoing pregnancy rates at 12 weeks gestation, biochemical pregnancies (positive beta), and clinical pregnancies (visible gestational sac at 7 weeks). Miscarriage rates and live birth rates were also counted. The only issue here is that some of these results don’t line up well. For example, a single biochemical pregnancy might actually result in two live births. It was also noted that the study experienced 8 “spontaneous” pregnancies (how the hell you have a spontaneous pregnancy in an IVF cycle, I don’t know) and that any cryopreserved embryos were transferred before a fresh IVF cycle was begun. From the paper, there is no indication of how those number were included or excluded from the results. Because of these kinds of issues, the numbers reported don’t add up directly.

Conclusions

The conclusions in any paper are where the researchers lay out how they think that the results do or do not support their hypotheses. So here is where you have to be very careful: conclusions can be influenced by researcher bias. Conclusions can also be manipulated by the statistical tests performed on the results. There’s an old saying – that figures lie and liars figure. And it’s true. You can make statistical results say anything you want depending on the test that you use.

Now, the researchers’ overall conclusion was twofold: first, that PGS did not increase the ongoing-pregnancy and live birth rates. This is obviously true when you look at the numbers reported in their results. The second assertation is that PGS “significantly reduced” the ongoing-pregnancy and live birth rates. And this is what I think is causing a lot of people to mis-interpret their conclusion.

Let’s sidetrack into the definition of “statistical significance” here. For a statistician, the word “significant” has a special meaning. Instead of the common definition, meaning that it is something important or something to pay attention to, statistically significant refers to the Type I error probability in the hypothesis test. Now I’m going to go very slowly:

o For this experiment, the null hypothesis was that PGS doesn’t improve ongoing pregnancy rates**

o Type I error is a “false positive”. In other words, if we have a Type I error, we decide that PGS improves pg rates when it really doesn’t.

o The researchers chose to test against a 5% level of significance. This means that there is only a 5% chance that the researchers will decide that PGS improves pg rates when it really doesn’t.

o A 5% significance level has no real meaning, other than being what the research team chose to test against. A decision that is statistically significant at 5% may or may not be significant at 3% or 1% or 0.5%.

o The researchers did not provide any sensitivity testing results on their statistical output. We also do not have enough data about the true underlying population characteristics. Therefore, we cannot draw any conclusions about the true power*** of their experiment.

Still with me? Okay, since we cannot draw any conclusions about the power of their experiment, it is very dangerous to throw out the conclusion that PGS significantly reduces pg rates. There is also no information on how the statistical studies went about controlling for variation due to underlying personal factors. The double-blind trial groups were selected randomly, controlling for maternal age, IVF vs. ICSI, and center location (two medical center in different locations participated). In theory, it’s possible that a higher proportion of women in the PGS group also had higher FSH levels and therefore poorer-quality eggs. The numbers themselves do show that the PGS group had a higher raw number amount of women with unexplained IF, tubal IF, anovulation, and ovarian failure.

The other thing that is troublesome about this second statement is that the results were statistically significant only over the total sample size. That is, the researchers could only make the statement that PGS reduces pg rates when they looked at the overall numbers. On a cycle by cycle basis , “the ongoing-pregnancy rates and the rates of biochemical and clinical pregnancy in the two groups were not significantly different.” Yep, you heard me. The researchers buried that little sentence in the results section, but it’s there. The PGS rates weren’t significantly better, but they didn’t seem to hurt anything either.

My take on their study is that they did some things right, and they did some things wrong. I feel like they proved that PGS doesn’t increase pg rates in the general population of advanced maternal age IVF candidates and so should not be added to the standard of practice “just because”. On the other hand, they have NOT conclusively proved that there is anything about PGS that is detrimental to the IVF success process on a single-cycle basis, and so that needs further research before making the kinds of assertions that they did in their conclusion. If it was me, and my personal RE told me that it would help in my particular circumstances, I would definitely go ahead with PGS with no qualms. The study is simply too broad to use as a decision-breaking piece of research. There are still too many holes in their data that need to be filled.

One thing that is important to remember when you see articles like this is that this represents basic research into a subject. It’s one of the first times that an experiment like this has been tried, and it will most certainly spawn controversy in the medical establishment for the very reasons we’re all picking it apart. But what it does is lay a foundation for other researchers to come back and start testing various parts of the overall experiment to resolve the inconsistencies and variation that were identified. It may not help us, since these tests take years (and sometimes decades) to complete, but the overall body of knowledge will eventually be generated.

————————————-

*I do want to point out that I think it was highly unethical for the study to use donor eggs from women with advanced maternal age to treat the women who were facing ovarian failure. I hope to god that those women knew the risk they were taking by using eggs from donors that also might have egg quality issues.

**Trust me here. If you want me to explain why this is our null hypothesis and the related concept of power, please email me.

***Again, power has a different definition for statisticians.

8 thoughts on “NEJM IVF PGS”

MLO says:

July 10, 2007 at 1:20 pm

Lovely description to a flawed study! Isn’t it amazing how much flawed information is out there? After doing one too many experimental design classes in college, I always look for actual statistics and analysis method in the abstracts.

I think it was the blog “Every little thing” that had an excellent roundup of last weeks presentations at some reproductive conference. It was amazing to me that some of those studies even got published!

Have you ever read Larry Gonick’s, “> Cartoon Guide to Statistics ? It is a great guide to stats that I am thinking of reviewing on my blog. (I have to actually read it instead of “peruse” it, to do that.)

Pax,

MLO

M says:

July 10, 2007 at 3:04 pm

Thank you for clarifying. I liked statistics…but wasn’t that good at it. You really did make the break down sound simple and understandable. I had come to a couple of those conclusions myself, but not with an educated reasoning. Again…well done and thank you.

Adrienne says:

July 10, 2007 at 8:24 pm

PGD is the only reason I would do IVF, given my chromosomal issues. With that in mind, and maybe I’m wrong for feeling this way, but if any procedure has a chance of shrinking my chances by that much, and there are no studies out yet that indicate it increases my chances (so that they cancel each other out), I’m not taking the risk.

Flicka says:

July 10, 2007 at 9:45 pm

I am not a statistician but I do have an article I’d love you to take a look at, if you’ve got the time!

Aurelia says:

July 10, 2007 at 9:54 pm

Thank you sharah for this, I am not as well versed in stats obviously, so I couldn’t do this, but this is very good. I simply do not understand why they couldn’t culture the tissue after each miscarriage in the study. They could’ve discovered so so much, idiots.

Interesting thing, there have been studies on PGD on IVF previously, with completely opposite results. They were not randomized control studies, but they were huge.

http://www.medpagetoday.com/OBGYN/Infertility/tb/1948

http://www.medpagetoday.com/OBGYN/2005ASRMMeeting/tb/1963

Dr.Cohen for example, reports a database in N.J. with over 33,000 test results and samples. The miscarriage rates were lowered to only 8-13%, and in older women, well, that IS significant.

And did they break out donor egg results separately? Were they younger eggs in older uteruses or older ones? Just because this is the big argument competing with PGD. The REs make more money selling donor eggs than doing PGD testing, so if PGD works their income drops. Sort of a bias if you ask me.

Schatzi says:

July 10, 2007 at 10:58 pm

Thank you, thank you, thank you for going through this study and explain what their stats really showed. I appreciate you analyzing it for us!

sharah says:

July 11, 2007 at 7:11 am

Aurelia, I can’t find anywhere that they reported the results of the donor eggs separately. The article simply states that 3 women, 2 in the PGS group and 1 in the control group, experienced infertility related to ovarian failure. The footnote to Table 3 says that “donated oocytes from women of advanced maternal age were used in these cases”.

Flicka, send me what you’ve got — I may not get to it today, but I’ll get it as fast as I can.

megan says:

July 11, 2007 at 2:00 pm

the health librarian approves of your evaluation!! 🙂 seriously, well done. you can’t take ANY study at abstract value….particularly those that actually make it into the news cycles.
randomly assigning PGD to people who may not need it also seems reckless to me….it’s no wonder people backed out of their consent after a time…