It seems like a long time since my previous post, but now see it was April 27, 2022, less than 2 months ago. I cannot decide whether I am deteriorating over the last year or so at an increasing rate. I seem to be battling chronic fatigue and brain fog more frequently. That, of course, degrades my ability to research and write. I had hoped that my new tactic of incidental tone checks, stretching and periodically getting out of the reclining work-bed where I spend my time would go unnoticed by my battered old body, and so I might avoid a CFS crash by not spending my available daily vitality ration in one session.

Perhaps I am simply suffering the general baby-boomers idiotic delusion that the less-pleasant aspects of life (e.g., aging) do not happen to those who lived during the bright, young, sunshiny days of the late 1960's. I am on my motorcyle on a summer morning 1968 and feel the caress of the sun, not unwelcome back then with reasonably cool air prior to the oppressive heat, fires and storms that are symptoms of a global fever of an Earth infected by bipeds today, and hear *Spanky and Our Gang* singing "Lazy Day" in my mind and wonder what happened to me, to us collectively.

I apologize to those who were snared by that southeast Asia meatgrinder, i.e., those who may have less than pleasant memories of the 1960's, early 1970's. I did know one fellow who frequented the local park where we all hung out who kept volunteering to return for another tour in the Nam, finding the entertainment there worth the risk. Ironically, he got shot at the park by a young redneck, after which he beat the hell out of the shooter. Don't know if he returned to Nam again, but I didn't see him again. I remember another acquaintance, Joe, a motorcycle club buddy. In 1968 we all used to pile into an old Plymouth after Billy, the lanky hillbilly owner, bolted on a four-barrel carburetor used only for expeditions, and cruise to Juarez, Mexico on Friday nights. Joe and I would sing Neil Diamond's Solitary Man and strum acoustic guitars wedged between seven or eight of us crammed into the old car, windows open and night air whipping us. He joined the Marines to get into the fight and came back with an edginess he formerly lacked. I heard he went into law enforcement. I hope he had a good life after that.

At any rate, I am seized more and more by the feeling that I am adrift in time, geologic time. I glance out at the desert to the west and the volcano in the distance. That dead volcano (now surrounded by upscale homes on the south side) is typical of those appearing near a rift. In this case the rift is the Rio Grande Rift, which I theorize may have been induced to begin tearing apart by the Chicxulub asteroid strike 65 mya (see my article at Nonantipodal Chicxulub impact seismic wave).

I try to put my limited existence in some kind of perspective ("the quest for meaning," think that was the subtitle of some psychology textbook I had), aware of the evolution of the universe and life on Earth over billions of years. It is difficult to comprehend. Well, more difficult if you have experienced phenomena that violate the assumptions of materialism and so cannot so easily accept the current concept of man as no more than animal. Otherwise I suppose I could adopt the apparent slightly depressed resignation of the younger generations, those without religion. Belief grounded in experience is not so easily discarded, however.

I smile regarding a conversation with a friend a quarter century ago or so, Susan responding to my discussion of reincarnation, eyes twinkling with high intelligence and humor, that "it would be better not to come back as pond-scum" (I agree). Obviously I have not reached the type of enlightenment which results in viewing any particular existence as acceptable ("or I may simply be a single drop of rain...but I will remain, I'll be back again", Johnny Cash singing with The Highwaymen).

Sometimes I wonder if my complaints about the the state of the world are abhorrent to God. I recall Isaiah 45:9 ( KJV) "Woe unto him that striveth with his Maker! Let the potsherd strive with the potsherds of the earth. Shall the clay say to him that fashioneth it, What makest thou?" But, if I, a man, can envision a kingdom of god (i.e., a world with more good than evil, more intelligence than stupidity, more wise men than brute beasts and a small population united in a civilization dedicated to the development of individual capability, increase of happiness and knowledge rather than the insatiable urge for pointless and pernicious acquisition), how can there be instead only mindless predation, which with humans has been perfected in the expression of unlimited greed and narcissism, despoiling the gift of increased consciousness on the upper rungs of the evolutionary ladder?

It is not uncommon for the thinking man to be dismayed by the state of civilization. For example, one could read Albert Einstein's March 3, 1947 letter to his longtime friend, Max Born (in *The Born-Einstein Letters*, correspondence between Max and Hedwig Born and Albert Einstein 1916 - 1955, translated by Irene Born). Born, who received a Nobel Prize in Physics in 1954 for his work decades before in quantum mechanics, had frequent disagreements with Einstein about the validity of quantum mechanics, which was ironic, considering that Einstein, Planck and Bohr are considered to be the fathers of the quantum theory (see the largely scientific biography of Einstein, *Subtle is the Lord*, by Abraham Pais, a physicist who was a young colleague of Einstein's at the Institute for Advanced Study after 1947):

(Einstein writing here to Born)...I am quite convinced that someone will eventually come up with a theory whose objects, connected by laws, are not probabilities, but considered facts, as used to be taken for granted until quite recently [Einstein is talking about his objection to the statistical interpretation of quantum mechanics, for which Born finally received the Nobel Prize in 1954]...I am glad that your life and work are fruitful and satisfying. This helps one to bear the craziness of the people who determine the fate of homo sapiens (so-called) on the grand scale. Maybe it has never been any better, but one did not see it as clearly in all its wretchedness, nor were the consequences of the bungling quite as catastrophic as under present conditions.

I note that Abraham Pais, mentioned above, with Gell-Mann, in 1955 proposed that the heavy boson K⁰ (they referred to it then as θ⁰) and its antimatter partner anti-K⁰ can transform into one another, See *Physical Review* Volume 97, Number 5, March 1, 1955, *Behavior of Neutral Particles under Charge Conjugation*, or discussion in §4.11.1 of Perkins *Introduction to High Energy Physics*, or §11-5 Feynman Lectures on Physics, Volume III Quantum Mechanics.

Returning to comments regarding the status of civilization, more recently [c. 2017], Peter Woit (a mathematician and physicist at Columbia, talking with John Horgan at blogs.scientificamerican.com), said

"I've always liked the Antonio Gramsci slogan [written by Gramsci, an Italian politician and philosopher, while he was in prison in 1927] 'pessimism of the intellect, optimism of the will.' [However] As for the future of humanity, the collapse of any semblance of a healthy democracy in the US... with the advent and triumph of "post-truth" politics has for me (and I'm sure many others) made it much harder to be an optimist. The longer-term trend of increasing concentration of wealth and power in the hands of a minority seems unstoppable. The "disruptive innovation" of our new Silicon Valley overlords and brave new world of social media and omnipresent digital monitoring of our existence is starting to make some of the dystopias of science fiction look frighteningly plausible. I'm still waiting for the future of peace, love and understanding promised when I came of age during the late 1960s."

If you think you lack personal experience that opposes the idea that man is no more than animal, I will ask you to consider what some of the highest-level minds of the twentieth century said about mathematics and physics in that context. I would say though that every human does experience things inexplicable in terms of materialism, but most are too distracted to notice, or have been successfully indoctrinated (by the soul-hating academics in the educational institutions of the current age) to repress their memory of those types of occurrences. Grace, those gifts of inexplicable knowing in daily life which momentarily lift the veil on the supernatural ground underlying the material world which is the stage on which the great themes of our existences are performed, and sometimes include the explicit shaping of events at critical points, is sufficient to help us on our way in this existence---provided we cooperate with it---accordingly, we treat no event as intrinsically insignificant.

Eugene Wigner (Nobel Prize in Physics 1963, for the discovery and application of fundamental mathematical symmetry principles contributing to the theory of the atomic nucleus and elementary particles in the 1930's, really the father of QFT, Quantum Field Theory, along with Pascual Jordan and P.A.M. Dirac; it should be note that Wigner told C. Yang, Nobel laureate 1957, that he had not dreamed the second quantized 𝜓 "could be used in real physics" until Enrico Fermi applied it in his 1934 𝛽-decay theory to "create an electron in the nucleus") wrote, in The Unreasonable Effectiveness of Mathematics in the Natural Sciences (1960), that "the enormous usefulness of mathematics in the natural sciences is something bordering on the mysterious and...there is no rational explanation for that" and that the principal emphasis in mathematics is on the invention of concepts, which in the case of the more advanced concepts (e.g., the complex numbers which advanced the 19th century physics which had been built solely on real numbers) are not suggested by any entities directly encountered in the world (unlike Euclidean geometry, which was abstracted from relations visible in the world of experience, lines, circles, areas, etc.; for an easily understood discussion of the transition from geometry related to general experience to abstracted systems, see an expanded version of Einstein's January 27, 1921 Address to the Prussian Academy of Sciences, "Geometry and Experience" in "Sidelights on Relativity": EBook #7333 at www.gutenberg.org).

Wigner went on to say that "...it is hard to believe that our reasoning power was brought, by Darwin's process of natural selection, to the perfection which it seems to possess" and "...it is not at all natural that 'laws of nature' exist, much less that man is able to discover them" (in a footnote Wigner adds that Erwin Schrödinger, Nobel Prize in Physics 1933 for developing wave mechanics, wrote that this miracle may be beyond human understanding).

I should acknowledge that those who adhere to the present world-view (see my discussion of mathematician Kurt Gödel's 1961 thinking on this Weltanschauungen in my October 13, 2021 post; at the age of 25 in 1931, Gödel completely changed the subject of logic with his incompleteness theorem, which put an end to David Hilbert's program to find a set of axioms sufficient for all mathematics, something I discussed in more detail in a 2019 post) of materialism would sneer at the suggestion of elevation of man as an animal on any ladder of evolution, much less claiming transcendant capabilities for man. For example, consider the following from the National Academies Press, In the Light of Evolution III: Two Centuries of Darwin, Chapter 14 Darwinian Revolution: Rethinking Its Meaning and Significance, by Michael Ruse:

"...debate over 'man's place in nature.' ...today we would never dare to use that kind of language...At some level, the Darwinian revolution destroyed forever the old picture of humans as somehow miraculously special, symbolically and literally as touched by magic. Admittedly, to this day Christian fundamentalists (and those of other religions) refuse to accept this, but it is true [sic]. Even if you think that you can still be religious, a Christian even, you have to rethink dramatically, emotionally even more than intellectually, what it means to be a human....It is not just a question of who we are but also of how we should live our lives. Although it is hardly the only factor, Darwinian thinking is at the center of the move to modernism, in some broad sense. Are we still to be subject to the old ways (women inferior, gays persecuted, abortion banned) or are we to look forward to a truly post-Enlightenment world, with reason and evidence making the running in an entirely secular fashion?...Ultimately, natural selection is not a progress-producing mechanism. So we could say that the Darwinian revolution does prove the nonspecial status of humans, and finally today people recognize the fact [sic]. However, this may not be the entire truth. A case can be made for saying that still today the popular perception is of progress leading to humans."

We must note that Darwin would not have agreed with Ruse's interpretation of man's place (as Ruse seems to admit later, but quickly dismisses), Darwin writing in Chapter XXI of his The Descent of Man and Selection in Relation to Sex (1871 first published, 1874 second edition):

"We have seen that man incessantly presents individual differences in all parts of his body and in his mental faculties. These differences or variations seem to be induced by the same general causes, and to obey the same laws as with the lower animals. In both cases similar laws of inheritance prevail. Man ...is occasionally subjected to a severe struggle for existence, and natural selection will have effected whatever lies within its scope. A succession of strongly-marked variations of a similar nature is by no means requisite; slight fluctuating differences in the individual suffice for the work of natural selection; not that we have any reason to suppose that in the same species, all parts of the organisation tend to vary to the same degree.

The high standard of our intellectual powers and moral disposition is the greatest difficulty which presents itself, after we have been driven to this conclusion on the origin of man. But every one who admits the principle of evolution, must see that the mental powers of the higher animals, which are the same in kind with those of man, though so different in degree, are capable of advancement. Thus the interval between the mental powers of one of the higher apes and of a fish, or between those of an ant and scale-insect, is immense; yet their development does not offer any special difficulty; for with our domesticated animals, the mental faculties are certainly variable, and the variations are inherited. No one doubts that they are of the utmost importance to animals in a state of nature. Therefore the conditions are favourable for their development through natural selection. The same conclusion may be extended to man; the intellect must have been all-important to him, even at a very remote period, as enabling him to invent and use language, to make weapons, tools, traps, etc., whereby with the aid of his social habits, he long ago became the most dominant of all living creatures.

A great stride in the development of the intellect will have followed, as soon as the half-art and half-instinct of language came into use; for the continued use of language will have reacted on the brain and produced an inherited effect; and this again will have reacted on the improvement of language. As Mr. Chauncey Wright (1. ‘On the Limits of Natural Selection,’ in the ‘North American Review,’ Oct. 1870, p. 295.) has well remarked, the largeness of the brain in man relatively to his body, compared with the lower animals, may be attributed in chief part to the early use of some simple form of language,—that wonderful engine which affixes signs to all sorts of objects and qualities, and excites trains of thought which would never arise from the mere impression of the senses, or if they did arise could not be followed out. The higher intellectual powers of man, such as those of ratiocination, abstraction, self-consciousness, etc., probably follow from the continued improvement and exercise of the other mental faculties."

[end Darwin quote] It appears to me that Consciousness (a superhuman version, God if you will; as I have hinted, my own life's experience persuades me that this God is not inactive in the lives of individuals) brought forth the Universe and that the physics (the natural laws and constants) of this universe leads inevitably to the emergence and progressive development of conscious beings. It is a tautology that any universe that is observed by conscious beings must necessarily be governed by laws of physics that permit the evolution of such observers, but I do not intend this in the anthropic principle sense where many random universes are postulated and we of course find ourselves in one of those few that supports our existence.

We can literally see back in time (through, for example, light that was emitted a few hundred thousand years after the Big Bang) well enough to know that the Universe had a beginning. That beginning was the Big Bang around 13.75 Gyr, i.e., 13.75 billion years ago, quoting a 2011 cosmology review by Cervantes-Cota and Smoot, downloadable at arXiv.org, a repository for scientific papers in various fields, so pronounced "archive", and the Greek letter Χ, is pronounced with a hard k sound. It seems reasonable to expect that this beginning had a cause, a cause necessarily external to it, that is, not governed by physical law (because there was no physical law yet).

To suggest that instead of a single Big Bang event, there is merely an evergrowing nucleation of new universes in a multiverse froth (eternal inflation and Coleman de Luccia bubble nucleation) is not a scientific proposition, but merely a transparent attempt to avoid the implications of a beginning and to salvage the bizarrely impotent string theory.

Without digressing too much, Peter Woit (degrees in physics from Harvard and Princeton and has taught mathematics at Columbia since 1989) says that string theorists don't actually have a theory, but rather an approximation to an unknown theory proposed to be valid within certain limits and a list of properties they would expect the unknown theory to possess. Accordingly, there is no way to tell if you have a solution to string theory (if you are on "dry land") or a non-solution to string theory (aptly, if ironically, termed "Swampland" by a string theorist). The Large Hadron Collider (which discovered the Higgs boson) found no evidence of extra dimensions or supersymmetry that would have been possibly consistent with string theory, i.e., string theory is unable to explain any known phenomena. Over the years, string theorists began to claim that their huge collection of unverifiable solutions to an unverifiable theory implied that there is a "Landscape" of bubble or local "pocket universes" (within a Multiverse), each conforming to one of these otherwise unseen string theory solutions for the vacuum (this also "explains" why the QFT estimate for the vacuum energy equivalent of the cosmological constant that characterizes the present accelerating expansion of our universe is 120 orders of magnitude too large, i.e., "we are just in an unusual pocket" of the Multiverse).

But if there are non-material aspects of human consciousness, how does this soul get into a body? I don't know that it is necessary to have souls inserted into human zygotes (a zygote is the initial cell formed by the union of two gametes, the egg and sperm of mother and father respectively), as appears to be the Catholic position set out by John Paul II (1997) in "Pope’s message on evolution", *The Quarterly Review of Biology*, Vol. 72, No. 4 (Dec., 1997), pp. 381-383:

"...man is called to enter into a relationship...with God...which will find its complete fulfillment beyond time, in eternity...Pius XII stressed this essential point: if the human body takes its origin from pre-existent living matter, the spiritual soul is immediately created by God..Consequently, theories of evolution which...consider the mind as emerging from the forces of living matter, or as a mere epiphenomenon of this matter, are incompatible with the truth about man....With man, then, we find ourselves in the presence of an ontological difference, an ontological leap, one could say..."

John Donne described beautifully his interpretation of human procreation in this context, in his poem, The Ecstasy:

...Our souls (which to advance their state

Were gone out) hung 'twixt her and me.

...Our bodies why do we forbear?

They'are ours, though they'are not we; we are

The intelligences, they the spheres.

So soul into the soul may flow,

Though it to body first repair.

As our blood labors to beget

Spirits, as like souls as it can,

Because such fingers need to knit

That subtle knot which makes us man,

So must pure lovers' souls descend

*childhood's end*as it were (alluding to the title of one of the science fiction novels of Arthur C. Clarke that I read in my own childhood). The government has recently begun to acknowledge inexplicable aerial phenomena (aka UFOs) observed by the military, which now include video. It would be disappointing if those sightings represented merely advanced technology, but with the usual beast still at the wheel, as it were. The clever 2011 Italian science fiction film,

*Arrival of Wang*, explored that possibility.

*Tears for Fears*released in 1985. I can still hear it in my head, though it now has an annoying ring it to it, this being the age of the elevation of the crowd, the rejection of distinction and talent, the death throes of democracy. True it was said that when the blind lead the blind, they will all end up in the ditch (Gospel of Matthew, KJV, 15:14). The man (or Man) who said that was killed at the urging of a crowd, see, e.g., Gospel of Matthew, KJV, 27:1 - 27:26.

How many American soldiers have to die before the powers that be stop forcing them into that death-trap Bell Boeing V-22 Osprey? I can be motivated to fight against evil without concern for personal survival (or at least preferring death to dishonor), but I would have difficulty getting into an airplane that can't decide if it is a helicopter or an airplane, and falls out of the sky frequently. It takes a different kind of man to knowingly adhere to discipline when it is clear that the mission will needlessly waste lives. I feel that way about D-Day also, but Eisenhower was one smart, tough man, so I can only pray he knew what he was doing. I watched the movie *Saving Private Ryan* again the other day, and was appalled at the slaughter incurred by a beach assault on heavily fortified positions (I believe veterans of D-Day found that movie very accurate in its portrayal of the conditions).

One must hope that the extent of the Uvalde, Texas massacre May 26, 2022 was related to following flawed orders rather than a failure of courage of all the officers on the scene. There is a continuing stream of lies and misinformation (presumably being fed to the media piecemeal to lessen the impact) to this day regarding what is a clear matter. The heavily armed police milled around outside for an hour while clearly audible gunfire representing executions of students continued inside and trapped students continued to call 911 begging for help. This situation appears as puzzling to law enforcement experts elsewhere as it does to me. Some of the the officers handcuffed parents who tried to go in and fight for their children's lives. Welcome to 21st century USA. When the people select leaders for anything other than intelligence and honor, this is the kind of thing you can expect to see more and more frequently.

As an aside, I note that I don't quite understand Jon Stewart's statement recently when he was awarded the Mark Twain Prize for American Humor at the Kennedy Center, "We need leaders who lead differently." I mean, it is not the method that is the problem so much as the character of the individual wielding power. The Founding Fathers and the Framers of the Constitution did everything they could to constrain power inherent in government, but could only pray that the people would not elect someone without intelligence or honor.

But I try to focus my attention on academic pursuits most of the time (and avoid having my nose rubbed in the mess we are in as a species). I have been studying the statistics of contingency tables and logistic regression in the context of epidemiology since February 2022. I became interested in this area incidentally to looking at indices for analysis of co-occurrence and similarity. That interest was prompted by my initial conflating of the context of Simpson's Paradox with the Simpson association metric. Similarity indices come up for example, in the analysis of co-occurrence of species in particular regions in ecology. That led me naturally to examine more closely the mathematics underlying the presentation and analysis of contingency tables.

Circling back to Simpson's Paradox in my study, I took hold (like a dog with a bone, well, like an old, toothless dog with a bone, more accurately), of the 1972-1974 Whickham survey data, Tunbridge et al 1977, a one-in-six survey of the electoral roll in Whickham, a mixed urban and rural district near Newcastle upon Tyne, UK, conducted in 1972-1974 to study heart disease and thyroid disease. Specifically, I read the Appleton, French and Vanderpump 1996, "Ignoring a covariate: an example of Simpson's paradox", *American Statistician*, 50(4):340-341 article. In this twenty year followup, a subset of the original survey sample, i.e., women who were classified as current smokers or as never having smoked, were determined to be alive or dead and statistical analysis soon revealed an association reversal (classic manifestation of Simpson's Paradox). That is, the 1996 Whickham subset in aggregate appeared to show less mortality among smokers, yet analysis of the individual age groups showed increased mortality among the smokers, as one might reasonably expect.

I replicated the result (arrived at the same result using different code, but the same data) quoted by Appleton et al 1996. My Mantel-Haenszel test on the first six two-by-two age-band tables (e.g., 18-24, 25-34, 35-44, 45-54, 55-64, and 65-74 year-old groups pulled from a copy of the Whickham data, 1314 records consisting of a row number, e.g.,"1314", alive/dead status at twenty years post-survey, e.g.,"Alive", smoking status Yes/No, e.g.,"Yes", and 1974 age of respondent at initial survey, e.g., 41 meaning 41 years old) produced a common odds ratio of 1.522873 (smokers have 1.53 better odds of being dead at twenty-year followup across the six partial tables taken overall), χ-squared of 5.5449 on 1 degree of freedom, p-value= 0.01853, 95% confidence interval (1.072226, 2.162923), meaning 95 percent of the time we would expect samples from this population to yield increased mortality for smokers in the range 1.07 to 2.16 times greater than for non-smokers. Compare *Appleton et al 1996*: "Woolf's test applied to the first six two-by-two tables gives an overall odds ratio of 1.53 with 95% confidence limits of 1.08 and 2.16."

We note that in 1996 *Appleton et al* meant that they used the common odds ratio equations of what is now called the Woolf Test on Homogeneity of Odds Ratios (no 3-Way association) to obtain the common odds ratio. Nowadays, Woolf Test is used (along with the Breslow-Day test) to indicate whether odds ratios are about the same among the included partial tables (because the Mantel-Haenszel test assumes that is true and might be misleading otherwise). Before doing the Mantel-Haenszel calculation of the common odds ratio above, we therefore conducted a Woolf Test and could not reject the null hypothesis with p-value = 0.9848 (high p-value suggests not uncommon result). That is, our Woolf Test on the six age-band tables from the Whickham data suggested that we should not reject the null hypothesis that the partial age-band tables had more or less the same (homogeneous) odds ratios. Pardon the apparently convoluted reasoning ("yes, we have no bananas," as it were), but it is more correct to say that we don't have enough evidence to reject the hypothesis that the age-bands have about the same odds ratios, than that it is true. So the implicit alternative hypothesis, i.e., that the age-band tables do have significantly different odds ratios (whatever those might be) of mortality for smokers is not supported.

This being so, we are justified in using the Mantel-Haenszel test, which returned a common odds ratio of about 1.53 for the age-band tables with a p-value of 0.01853, i.e., would expect to see this odds ratio by chance only about 2% of the time. We therefore reject the null hypothesis of this test, which was that the common odds ratio of the age-band tables was no different than one. In the Whickham context, odds of simply one or 1:1 would mean smokers have no greater or lesser odds than non-smokers to die.

However, as we said earlier, if you simply add all the ages together and compare the proportion of smokers who died compared to those who lived, the odds of dying for smokers compared to non-smokers in the study is only 0.68 (as also noted by Appleton et al 1996):

> Whickham_2x2

outcome

smoker Alive Dead

No 502 230

Yes 443 139

> OddsRatio(Whickham_2x2)

[1] 0.6848366

> OddsRatio(Whickham_2x2, conf.level=0.95)

odds ratio lwr.ci upr.ci

0.6848366 0.5353300 0.8760973

I guess I should explain that by odds of a smoker being dead, I mean (refer to the numbers in the Whickham_2x2 table above) the number of smokers dead, 139, divided by the number of smokers alive, 443, or 0.3137698. The odds of non-smokers dead to alive is similarly, 230/502 or 0.4581673. The odds ratio is then the death/alive odds for smokers divided by the death/alive odds for non-smokers, 0.3137698/0.4581673 = 0.6848367.

On the other hand, if we obtain the odds ratio for each of the separate tables using again the R package DescTools function OddsRatio we see the following individual odds ratios for each age band:

> apply(WhickamBound1rst6, 3, OddsRatio)

A1824 A2534 A3544 A4554 A5564 A6574

2.6792453 0.9344262 1.5639098 1.4951456 1.6051829 1.4323432

Why this contradiction between the odds ratios from the total table (0.6848366 ) vs the sub-tables by age group (2.6792453, 0.9344262, 1.5639098 ...)? The Whickham data was the outcome of a so-called experiment of nature, i.e., an experiment where the epidemiologist merely observed the outcome (whether the survey subjects were dead or alive at 20 year followup) passively and then cross-classified the subjects into smoking and non-smoking groups. Because the age of subjects varied, if age influences the effect of smoking on mortality (which it does), then the outcome is said to be confounded by the effect of age mixing with the effect of smoking.

Since this is not a controlled experiment, the ages of the subjects are not fixed, however, it is possible to stratify the data on the subjects by age as well as smoking status. In that manner the experiment is transformed more or less into a series of smaller experiments within which age is controlled. In other words, the data can be partitioned into tables of smoker and non-smokers at each age band and a comparison of the dead/alive ratio within each age-band made (the statistical comparison is then of odds of death conditional upon age). We gave you the odds ratios of those individual age-stratified tables above for the 18-24, 25-34, 35-44, 45-54, 55-64, and 65-74 year-old groups above.

If the assumption can be made that these comparisons are estimating the same factor within each age stratum (which was suggested by our Woolf Test result above finding no significant heterogeneity in the odds ratios of the age-band strata), i.e., the mortality odds ratio can be attributed to smoking status within each age-band, then methods like the Mantel-Haenszel test can be applied to combine the odds ratios from the separate age-band tables (as we did above) and obtain a useful estimate of the overall mortality odds ratio.

We used some plotting software (see *A Fourfold Display for 2 by 2 by k Tables*, Michael Friendly, Psychology Department, York University, January 24, 1995,Report Number: 217) now included in the R statistical program, *R: A Language and Environment for Statistical Computing*, (that we were using above, the OddsRatio function being part of the *DescTools: Tools for Descriptive Statistics *package available for R) to graphically display the odds ratio in aggregate:

In that figure above, you see the non-smokers displayed in the top row (the top half of the circle, label above "smoker: No"). The smokers are displayed in the bottom row (the bottom half circle, label "smoker: Yes"). The left half of both rows are the number of subjects still alive at 20 years, labeled on the left "outcome: Alive." So you see there are 502 non-smokers alive and 443 smokers alive, as we showed earlier in our aggregate 2 x 2 table (2 x 2 meaning there are two rows and two columns in the table). On the right, labeled "outcome: Dead", are shown the quadrants of the non-smoker (top row) and smoker (bottom row) subjects who are dead at twenty year followup. The relative size of the quadrants indicates the relative magnitude of the odds for that row. So you see the non-smokers, with 230 dead, have a dark blue quadrant that is larger than the 502 alive non-smoker quadrant in light blue next to it, while the smoker live quadrant, with 443 subjects in lower left dark blue, is larger than the smoker dead quadrant next to it with 139 dead smokers. This shows us graphically that the odds of non-smokers being dead is greater than the odds of smokers being dead, i.e., inverting that statement, the odds ratio of smokers mortality to non-smokers mortality is less than one, OR = 0.6848366, as we discussed above.

The fourfold plot of each of the six age-band tables shows the opposite relationship, (these plots illustrate graphically the individual table odds ratios we gave earlier for each of the age-bands):

Now we see, for example, that the 18-24 year-old group has a larger odds ratio of non-smokers alive vs smokers. The top left quadrant in the Strata: A1824 figure is large blue, alive non-smoker count 71, and its dead count of 1, to right of that quadrant, is light blue smaller. The bottom right quadrant of Strata: A1824, the dead smokers with count 2, is large blue compared with its live smoker light blue quadrant just to left with count 53. This relationship is repeated in all six of the figures in this graphic, except for the Strata: A2534 figure, which shows not much of a difference in the mortality odds for smoker vs non-smoker for this 25-34 year-old subject group (all the quadrants are about the same size in this particular group).

Appleton et al 1996 suggested that calculating a standardized mortality ratio or SMR for smokers and non-smokers would be another way to resolve the age confounding problem. The SMR is the ratio of observed to expected deaths in a population.

In our context we are not trying to obtain an absolutely comparable SMR of smokers and non-smokers (one for each), but rather using indirect standardization to obtain a standard population rate (per age strata) by which the expected mortality rate of smokers and non-smokers can be calculated and their ratio taken as a proxy fox a common odds ratio as calculated by Mantel-Haenszel test or other means. If we wanted instead to have an idea of how the SMR of smokers and non-smokers compared to the general population mortality, because the length of time until followup is significant (twenty years), we would have to take account of the increasing age of the subjects during that time and promote the subject through the relevant age bands of standard mortality during the length of the followup.

Given that we will use identical standard mortality rates (per age group), for both smokers and non-smokers, and given that we establish that the observed odds ratios within each Whickham group, smokers and non-smokers separately, by age band or strata, are homogeneous (by Woolf Test), we will be justified in comparing the resulting SMR of smokers to the SMR of non-smokers, i.e., creating a common odds ratio that could be compared to the Mantel-Haenszel 1.53 result earlier.

The technical requirements for using a ratio of two SMRs as an alternative estimate of the common odds ratio is discussed in Section 15.8 of *Statistical Models in Epidemiology *by David Clayton and Michael Hills (a 1994 textbook intended for use in a graduate course on the statstical basis of epidemiology). We believe we have by design satisfied these technical requirements (as we said above, we use identical standard rates for both groups and establish that the groups are internally homogenous by age-band regarding the effect of smoking on mortality), but note that the authors advise this practice should usually be avoided (because both sets of age-specific rates must be proportional to the reference rates, as we discussed).

We calculate standard annual mortality rates per 5-year age bands (because that was convenient given the UK data available) using population estimates and numbers of deaths from the *Historic Mortality Data Files* database published by the *Office for National Statistics* (United Kingdom). Section 6.5 of *Statistical Models in Epidemiology *by David Clayton and Michael Hills describes what we are doing here, i.e., "For example, the all-cause mortality rate for the age band 50-54 during 1983 is estimated by D/Y where D is the number of deaths during 1983 for which the subject's age at death was in the range 50-54, and Y is the person-time lived during 1983 by that part of the population whose ages were in the range 50-54 during 1983." For our purposes, Y is simply the population of females in the year of interest, in the particular age-band under consideration, which is available in the UK POPLNS.csv data file (see below).

We multiply the calculated UK annual standard mortality rates by 20 in our application since our Whickham study extends over 20 years. From that 20-year rate we calculate the number of deaths we should expect to observe given the number of subjects in the Whickham study. We then take the ratio of the observed number of Whickham deaths to our expected number of deaths to obtain an SMR for smokers and an SMR for non-smokers. We then compare the ratio of the two SMRs to our Mantel-Haenszel common adds ratio of 1.53.

We obtained the *Historic Mortality Data Files* database published by the *Office for National Statistics *(United Kingdom) in 1997 under the title "Twentieth Century Mortality Files", i.e., the RG69-2 NDAD or CRDA/20/DS/2 dataset. We secured this copy at [UK National Archives], or in particular, [RG 69/2 1901-1995 Historic Mortality: 1901-1995 dataset].

RG69-2 consists of three types of tables: a Population table (POPLNS) covering the period 1901-1995; nine Historic Deaths tables (ICD1DTHS-1CD9DTHS) corresponding to the different revisions of the ICD which were implemented in England and Wales in 1911-1995; and nine ICD dictionary tables (ICD1DESC-1CD9DESC), which explain the codes used for causes of death in the Historic Deaths tables. We aggregated the deaths for the year we experimented with (1974 ), so were not interested in the ICD cause of death.

As we described above in general terms, the number of United Kingdom deaths from all causes (we aggregated the figures for different ICD causes of death) in 1974 for each 5-year age group, in data file ICD8DTHS, is then divided by the United Kingdom population total in 1974 (in data file POPLNS) for each of those 5-year age groups to obtain a standard mortality rate from all causes for each age-band.

We accomplished the above operations using a combination of Linux shell utilities, open source spreadsheet software and the R statistical package (we did not spend valuable time to be elegant about it). Once we had the rates, we could multiply the subject numbers in each age group in the Whickham data to obtain the expected number of deaths at 20 years in the smoker and non-smoker groups (again, using whatever software and techniques got the job done with minimum design time, primarily spreadsheet and R at this point). Taking the ratio of expected to observed in each we obtained the SMR for each and then compared the two SMRs. We calculated the SMRs manually in a spreadsheet and then separately using an R statistical package for epidemiology, *epitools*, specifically the epitools::ageadjust.indirect() function, obtaining identical results.

We show you some of the R code to give you an idea what the numbers and code looked like (I plan to write up this study soon in formally typeset pdf so can write mathematics properly and accompany that file with all the data and code; I will add a link here to whatever repository I utilize, probably GitHub). First the Whickham age-bands subset we used here, non-smokers first (note that for convenience we begin at age 20 in this analysis, vs 18 in the Mantel-Haenszel analysis earlier; it does not make much difference in the result). By "case" we mean, "Dead at 20 year followup." By "population" we mean, "the number of Whickham subjects in this group initially surveyed in 1974:

> Whick_nosmoke_case_count

Cases

20-24 0

25-29 2

30-34 2

35-39 4

40-44 3

45-49 7

50-54 5

55-59 20

60-64 21

65-69 51

70-74 50

> Whick_nosmoke_pop_by_age

No smoker subjects

20-24 53

25-29 77

30-34 79

35-39 61

40-44 50

45-49 38

50-54 40

55-59 61

60-64 61

65-69 68

70-74 61

> str(Whick_nosmoke_pop_by_age)

num [1:11, 1] 53 77 79 61 50 38 40 61 61 68 ...

- attr(*, "dimnames")=List of 2

..$ : chr [1:11] "20-24" "25-29" "30-34" "35-39" ...

..$ : chr "No smoker subjects"

> sum(Whick_nosmoke_pop_by_age)

[1] 649

Next we create similar data structures (in R) for the smokers:

> Whick_smoke_case_count <- c(2,1,2,3,7,15,13,21,31,13,18)

> Whick_smoke_case_count <-

+ matrix(Whick_smoke_case_count, 11, dimnames = list(c("20-24", "25-29", "30-34", "35-39", "40-44",

+ "45-49", "50-54","55-59","60-64","65-69","70-74"), c("Cases")))

> Whick_smoke_pop_by_age <- c(39,57,68,56,49,69,62,50,66,16,21)

> Whick_smoke_pop_by_age <- matrix(Whick_smoke_pop_by_age, 11, dimnames = list(c(

+ "20-24", "25-29", "30-34", "35-39", "40-44", "45-49",

+ "50-54","55-59","60-64","65-69","70-74"), c("smoker subjects")))

We won't try to show you the spreadsheet where we calculated the UK rates. Nor will we set out the various maneuvers we used to "wrangle" the large text file UK data to get just the age-bands and sex and aggregate death counts for 1974 which went into the spreadsheet. Hadley Whickham and Garrett Grolemund suggested the term "wrangling" for the usual fight of a data analyst to put the raw data into convenient form. Hadley's surname is an instance of synchronicity, i.e., that he is named for our survey data is a coincidence, as far as I know. Here are the standard mortality rates that we calculated, entered into R:

> UK_1974_fem_mort_rate <- c(0.000407317506364, 0.000449187985443, 0.000643384822028,

+ 0.001124680488498, 0.00193578375286, 0.003400379249628, 0.005290760543073,

+ 0.0081181522061, 0.012307129555733, 0.019872025307355, 0.033956337906636)

And we added age-band labels to those numbers:

> UK_1974_fem_mort_rate <- matrix(UK_1974_fem_mort_rate, 11, dimnames =

+ list(c("20-24", "25-29", "30-34", "35-39", "40-44", "45-49",

+ "50-54","55-59","60-64","65-69","70-74"),

+ c("UK 1974 fem mort rate")))

Then we get the SMR for non-smokers using that data:

> SMR_WhickNoSmokers_1974indirSt <- ageadjust.indirect( count = Whick_nosmoke_case_count, pop =

+ Whick_nosmoke_pop_by_age,

+ stdrate = UK_1974_fem_mort_rate * 20, stdcount = NULL, stdpop = NULL, conf.level = 0.95)

> SMR_WhickNoSmokers_1974indirSt$sir

observed exp sir lci uci

165.000000 105.636375 1.561962 1.340925 1.819434

The epitools::ageadjust.indirec return value $sir is Standardized Incidence Ratio and is the SMR or standardized mortality rate when the outcome of interest is a mortality rate. If you are interested in reading more about the subject, *Boston University School of Public Health*, bu.edu/otlt/MPH-Modules/EP has an excellent treatment, *EP713_StandardizedRates*. Notice that the NoSmokers sir value returned is observed deaths among non-smokers (165.0000) divided by exp (expected deaths among non-smokers using our UK standardized mortality rates) (105.636375), or non-smokers SMR =1.561962.

And the SMR for smokers using that data:

> SMR_WhickSmokers_1974indirSt <- ageadjust.indirect( count = Whick_smoke_case_count, pop =

+ Whick_smoke_pop_by_age, stdrate = UK_1974_fem_mort_rate * 20, stdcount = NULL, stdpop = NULL, conf.level

+ = 0.95)

> SMR_WhickSmokers_1974indirSt$sir

observed exp sir lci uci

126.000000 61.098835 2.062232 1.731835 2.455662 Notice that the Smokers sir (SMR, standardized mortality rate) is again observed deaths (126.00000) divided by exp (expected deaths) ( 61.098835) or SMR smokers = 2.0622). Then we look at the ratio of the two SMR's:

> SMR_WhickSmokers_1974indirSt$sir[3] / SMR_WhickNoSmokers_1974indirSt$sir[3]

sir

1.320283

"sir" is the standardized index ratio, SMR in our context. So we just divided 2.062232 (sir from smokers above) by 1.561962 (sir from non-smokers above) to obtain an equivalent common odds ratio of 1.3202834, the odds of mortality for smokers vs non-smokers common across the "20-24", "25-29", "30-34", "35-39", "40-44", "45-49", "50-54","55-59","60-64","65-69","70-74" age-band tables. That compares to the common odds ratio of 1.522873 (roughly 1.53) obtained using Mantel-Haenszel test on about the same data, but partioned into only six tables, spanning ages 18 - 74 (recall we were using this partitioning and range in order to follow the result published by 1996 Appleton et al) rather than 20 - 74. We did check the Mantel-Haenszel result on the 11 table partition over 20 -74 also and obtained very little difference, common odds ratio 1.531171. So the SMR ratio of 1.32 proxy for common odds ratio is 16% lower than the Mantel-Haenszel odds ratio 1.53, not too much difference.

We decided to see what a logistic regression on the 20 -74 Whickham data (flat, i.e., not partitioned into tables, but otherwise the same data used above) would tell us (using the 20-74 age range so could compare more closely to our SMR ratio work). Some of you may wonder why we included an explicit continuous variable for square of the age variable rather than simply specifify that in the R formula, e.g., I(age^2): This was merely convenient during model testing with various utilities. Some of the R code:

# logistic fit on 20 - 74 Whickham data

> Whickham2074_glm_01b <- glm(outcome ~ age + ageSqrd + smoker, family = binomial, data =

+ d.Whickham2074w10cols)

We obtained:

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -4.7591905 1.2492421 -3.810 0.000139 ***

age 0.0110796 0.0509544 0.217 0.827864

ageSqrd 0.0010282 0.0005002 2.056 0.039820 *

smokerYes 0.3178833 0.1738435 1.829 0.067466 .

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The antilog of the regression coefficient smokerYes = 0.3178833:

> exp(0.3178833)

[1] 1.374216

That tells us that the logistic fit on the 20 - 74 year-old Whickham data can be charactized by a common odds ratio of 1.37 (mortality odds for smokers vs non-smokers), consistent with our SMR ratio calculation (almost identical) and Mantel-Haenszel results above (within 16%). A plot of that regression fit for smokers vs non-smokers shows the increasing effect of age combined with smoking status on the probability of being dead at twenty years for smokers vs non-smokers:

The vertical axis in the Logistic regression graph is the probability ("Pr(Dead)") of being dead. This was a logistic fit so the (natural) logarithm of the odds (probabilty dead / 1 - probability dead) is a linear function of the X variables (and is often called the log odds). This is also referred to as the logit transformation of the probability of success (success here is "Dead" since that is our target interest, albeit not the type of success one would normally wish for), probability Pr(Dead):

log{ probability(Dead) / [ 1 - probability(Dead) ] } = glm(outcome ~ age + ageSqrd + smoker)

where the equation above becomes:

log{ probability(Dead) / [ 1 - probability(Dead) ] } = beta_0 + beta_1 X_1 + beta_2 X_1^2 + beta_3 X_3

where the beta's and X's are, from our glm result from R earlier above:

beta_0 is coefficient (Intercept)= -4.7591905

beta_1 is coefficient for age (age is X_1) = 0.0110796

beta_2 is coefficient for age^2 (age squared is X_1 * X_1) = 0.0010282

beta_3 is coefficient for smokerYes = 0.3178833

and the equation underlying the graphed lines is then:

log{ probability(Dead) / [ 1 - probability(Dead) ] } = -4.7591905+0.0110796(age)+0.0010282(age*age)+0.3178833(smokerYes=1/0)

In other words, the graph line for the smokers in the graph uses the beta_3 coefficient = 0.3178833 in the equation (since it is multiplied by "1" for smoking status equal to "Yes") and the graph line for non-smokers in the graph does not use beta_3 coefficient = 0.3178833 (because it is multiplied by "0" for nonsmoker and so is zeroed out).

In making the graph, the R software calculated numbers for the rhs of the logit equation above, separately for smokers (with the beta_3 coefficient present) and non-smokers ( beta_3 coefficient not included) for ages from 20 to 74 (the x-axis of the graph). It then obtained the probability of death from those numbers by exp(logit) / [1+ exp(logit)] (this is obtained by some simple algebra using the above equation). In exp(logit) here, we mean "substitute logit the set of numerical values obtained evaluating the rhs (right hand side) of the equation above for each x-axis age."

So what do all those numbers tells us about the common odds ratio of dying for smokers vs nonsmokers for the women in the Whickham study?

Recall our Mantel-Haenszel test on the first six two-by-two age-band tables (e.g., 18-24,

25-34, 35-44, 45-54, 55-64, and 65-74 year-old groups ) common odds ratio 1.522873, 95% confidence interval (1.072226, 2.162923).

The logistic fit on ages 20 - 74 Whickham data gave us an equivalent common odds ratio of 1.374216 with 95% confidence interval (0.9794831, 1.9376965). We obtained the confidence interval using the model object returned by glm() earlier, processing it with R function confint(), which uses a profile likelihood method to calculate those limits when it recognizes a glm object (we also calculated the limits manually using the normality assumption and obtained similar limits 0.9774091 to 1.932117).

The fact that those confidence intervals include "1" approximately means the common odds ratios results may not be significant (if your odds are 1:1 for mortality smoking vs non-smoking, there is no difference in mortality). However, we find the graph of the logistic regression fit for probability of death for smokers vs non-smokers fairly convincing that there is a reliable increase in the probability of being dead at twenty-year followup for smokers vs non-smokers, controlling for the confounding factor of age (which should come as no surprise, there having been numerous well-designed studies done on the subject of the health risk of smoking).