Markov-chaining my PhD thesis II

Matt Dray (@mattdray)

I posted a while back about using a Markov chain to generate sentences using my PhD thesis as input. I also posted about the {markovifyR} package for generating lyrics by The Mountain Goats.

This is a quick update to that original post, but this time I’m using {markovifyR}.

The PhD text is available from my {dray} package. I’ll remove the blank lines and ignore the preamble and references section. As a reminder, the thesis is about the decomposition of tree litter that’s been exposed to environmental stressors like elevated carbon dioxide levels and acidified streams.

library(dray)  # remotes::install_github("matt-dray/dray")
phd_text <- dray::phd  # get data
phd_text <- phd[nchar(phd) > 0]  # remove blank lines
phd_text <- phd[27:376]  # ignore preamble and references

Now to prepare the workspace. {markovifyR} is a wrapper of the markovify Python module; you need to install markovify with a call to system() and then also ensure you’ve installed {markovifyR} from GitHub and also th dependency {furrr}.

# Prepare the workspace
# system("pip install markovify")
library(markovifyR)  # remotes::install_github("abresler/markovifyR")
library(furrr)  # install.packages("furrr")

The function generate_markovify_model() builds the model and markovify_text() generate some sentences based on that model.

# Build model
markov_model <- generate_markovify_model(
  input_text = phd,
  markov_state_size = 2L,
  max_overlap_total = 25,
  max_overlap_ratio = 0.7

# Generate lines
phd_speak <- markovify_text(
  markov_model = markov_model,
  maximum_sentence_length = NULL,
  output_column_name = 'phd_speak',
  count = 50,
  tries = 100,
  only_distinct = TRUE,
  return_message = TRUE
## phd_speak: Taxon richness and diversity of terrestrial and aquatic ecosystems, allowing for a better understanding at several spatial scales.
## phd_speak: All individuals were adults, apart from larval Odontocerum albicorne showed a feeding preference in the no-choice test using two-way ANOVA.
## phd_speak: Princeton University Press, Cambridge.
## phd_speak: The study of rotting detritus is not for the studies reported in Chapter 3 microbial conditioning was shown to affect leaf litter consumption by terrestrial and aquatic habitats: effects at the forest scale.
## phd_speak: Journal of the studies reported in this thesis considered decay rates in both terrestrial and aquatic locations could affect the chemical composition of ambient- and elevated-CO2 litters.
## phd_speak: Chapter 7 synthesises the findings of the Linnean Society, 91, 83–94.
## phd_speak: Invertebrate species composition may also help explain the reduced decomposition rates and suppressed acid-sensitive families of Plecoptera, the major colonists of litter chemical composition, but little is known to affect plants physically.
## phd_speak: This study aimed to understand how changes to atmospheric composition has the potential for algal colonisation of leaf Dry Mass
## phd_speak: Bulletin of the no-choice test, the main and interactive effects of elevated CO2.
## phd_speak: This simplicity has allowed for broad underlying principles to be made available to outside organisations.
## phd_speak: For example, more work is required on the habitat: nitrogen increased and C/N ratio was reduced through time in all experiments, but different experimental conditions varied in their responses, suggesting that caution has to be taken when extrapolating general trends from single-species studies.
## phd_speak: This could cause increased retention of SWD in terrestrial and aquatic invertebrate detritivores was species-specific.
## phd_speak: A GLMM was also fitted to the use of loosely bound leaf packs could be disrupted by atmospheric CO2 reduced litter palatability to macroinvertebrate detritivores with consequences for ecological interactions in both terrestrial and aquatic organisms.
## phd_speak: Biological Reviews of the experiments.
## phd_speak: Journal of Ecology, Evolution, and Systematics, 17, 567–594.
## phd_speak: Most species did not result in major differences in chemical composition of litter decay.
## phd_speak: Pollutants dissolved in rainwater – particularly sulphur and nitrogen concentrations and greater C/N ratios.
## phd_speak: In turn, foliar chemistry is affected, changing the chemical composition differently depending on whether it was exposed to terrestrial and aquatic organisms.
## phd_speak: To investigate these effects, twigs of Betula pendula were produced for collection at each time period.
## phd_speak: This work has not been submitted in substance for any degree or other award.
## phd_speak: Relatively little work on the mesh of the North American Benthological Society, 21, 16–27.
## phd_speak: 1000 ppm in the choice and no-choice tests.
## phd_speak: A meta-analytic review of the litter layer.
## phd_speak: Multiple stressor effects including greenhouse gases, urban pollution can increase it.
## phd_speak: Trees were propagated from 16 March–29 October 2011 from in situ silver birch trees.
## phd_speak: In contrast to this material also seemed to be used for chemical analyses.
## phd_speak: Discs were replenished when at least in the aquatic experiment, 48 twig bags of each study site and secured with 0.5 m steel rod.
## phd_speak: The use of macroinvertebrates of the North American Benthological Society, 16, 750–759.
## phd_speak: It will therefore be important to understand how this will affect litter quality and breakdown, but not sufficiently to affect invertebrate feeding in Chapter 5, but the direction of response, only that there is a major source of energy for headwater streams of contrasting pH.
## phd_speak: Twig chemical composition and decomposition of organic matter.
## phd_speak: It will therefore be important to understand how changes to litter chemical composition and its subsequent decomposition.
## phd_speak: This could cause increased retention of SWD may be more important than effects of initial chemical values, and stored at room temperature prior to experimentation.
## phd_speak: Gulis, V., Rosemond, A.D., Reice, S.R., Elwood, J.W.
## phd_speak: Effect of CO2 treatment on twig chemical composition and the total nitrogen content of the atmosphere.
## phd_speak: This thesis is the result of the Cambridge Philosophical Society, 88, 327–348.
## phd_speak: Pairs of contrasting pH.
## phd_speak: Threads allocated to 0 and 28 days of exposure, and their effects on invertebrate assemblages.
## phd_speak: Litter decomposition promotes carbon and nutrients to the experiments to allow for gut clearance.
## phd_speak: For the aquatic study and those from 2012 were used to encase the leaves, but rather a thread is used to better mimic natural situations.
## phd_speak: NMDS then iteratively assigns samples to a woodland floor for 0, 28, 56 and 112 days.
## phd_speak: These results suggest that ongoing atmospheric changes could impact litter chemistry, affecting invertebrate feeding in Chapter 5, but the direction of these changes on consumer organisms.
## phd_speak: During litter breakdown, environmental stressors had variable effects on the functioning of both terrestrial and aquatic ecosystems, and can inform mitigation strategies.
## phd_speak: Thesis submitted for the choice test, consuming more ambient- than elevated-CO2 birch discs, but there are still knowledge gaps to be released and subsequently cycled, which supports food webs from the water.
## phd_speak: Invertebrate feeding responses to alder were highly idiosyncratic in the choice test, leaf palatability affected invertebrate feeding, but this was dependent on the chemical composition of litter from the bottom-up.
## phd_speak: The trees used in this study suggest that the parameter was removed during model minimisation; the GC × T interaction was removed during model minimisation.
## phd_speak: Only one tree and one invertebrate species from different habitats is essential if we are to predict the response of litter decomposition in deciduous temperate forests and adjacent streams.
## phd_speak: Canhoto, C. & Coomes, D.A.
## phd_speak: Further work is required to understand how this linkage might be affected by CO2 treatment on mass loss, although invertebrate diversity and richness generally fell through time in the terrestrial study.
## phd_speak: Literature review: The effects of multiple, potentially interacting, stressors on this substrate.
## phd_speak: Chapter 7 synthesises the findings expand on our current understanding of how humans may mitigate or cope with future perturbations to ecosystem service provision.

I ran this a few times and picked some favourites (my comments in square brackets):

  • In general, litter chemical composition and decomposition. [This is the ultimate summary of my thesis.]
  • Bonferroni-adjusted critical significance levels were compared by pairwise PERMANOVA. [That sure sounds like statistics!]
  • Detritivorous macroinvertebrate species used to overcome this issue in future, where no material is used to hold them together. [That’s the basis for a sci-fi movie, surely?]
  • A GLMM was also affected by growth condition, to be made available to outside organisations. [Not technically wrong: a variable affected a model, that model has been made available. Just sounds weird.]
  • Canadian Journal of the laboratory in a randomly-ordered 3 × 3 grid. [That is one niche academic journal.]
  • I am grateful for the choice test. [I designed and ran it, so I have only myself to thank.]
  • This thesis is the total nitrogen content of the North American Benthological Society. [I’m sure the NABS has more nitrogen in its possession.]
  • This simplicity has allowed for broad underlying principles to be linked to litter chemical composition differently depending on whether it was exposed to a labeled 0.5 m steel rod. [Holy smokes, who knew that the steel rods were the deciding factor? Were they radioactive or something?]
  • My parents have never wavered in their responses, suggesting that caution has to be available online in the CEF. [It seems the Controlled Environment Facility (CEF) cannot control for the effect of parental input; best to provide some guidance on what to do if parental influence is strong.]
  • This reduced decomposition rate in the University’s Open Access repository and for inter-library loan, and for inter-library loan, and for inter-library loans. [Ah good, slower-decomposing books will last longer; apparently this is important for inter-library loans? ]
  • Deciduous woodlands are dependent on the parent tree. [Yes, all deciduous woodlands have one central master tree from which all the others are descended.]
  • Currently there is a key ecosystem process in temperate deciduous woodlands and streams. [Yes, but what is it? I must know!]

Once again, simply copy and paste several of these sentences into your own thesis. The benefit is that you won’t have to grow any trees in a high carbon-dioxide atmosphere; you won’t have to spend months packing leaves into tiny bags; you won’t have to attach those bags to tens of IKEA cutlery holders and dunk them in frozen streams in mid-Wales; you won’t have to grind those leaves into a suspicious-looking powder and transport them cross-country for chemical analyses; you won’t have to imprison any insects against their will.

Good luck at your viva!

Session info

## [1] "Last updated 2019-04-30"
## R version 3.5.2 (2018-12-20)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.6
## Locale: en_GB.UTF-8 / en_GB.UTF-8 / en_GB.UTF-8 / C / en_GB.UTF-8 / en_GB.UTF-8
## Package version:
##   assertthat_0.2.0   base64enc_0.1.3    BH_1.69.0.1       
##   blogdown_0.11      bookdown_0.9       cli_1.1.0         
##   codetools_0.2-16   compiler_3.5.2     crayon_1.3.4      
##   digest_0.6.18      dplyr_0.8.0.1      dray_0.0.0.9000   
##   emo_0.0.0.9000     evaluate_0.13      fansi_0.4.0       
##   furrr_0.1.0        future_1.12.0      gifski_0.8.6      
##   globals_0.12.4     glue_1.3.0.9000    graphics_3.5.2    
##   grDevices_3.5.2    grid_3.5.2         highr_0.8         
##   htmltools_0.3.6    httpuv_1.5.0       jsonlite_1.6      
##   knitr_1.22         later_0.8.0        lattice_0.20-38   
##   listenv_0.7.0      lubridate_1.7.4    magrittr_1.5      
##   markdown_0.9       markovifyR_0.101   Matrix_1.2-16     
##   methods_3.5.2      mime_0.6           parallel_3.5.2    
##   pillar_1.3.1       pkgconfig_2.0.2    plogr_0.2.0       
##   plotrix_3.7-4      promises_1.0.1     purrr_0.3.0       
##   R6_2.4.0           RColorBrewer_1.1-2 Rcpp_1.0.1        
##   reticulate_1.11.1  rlang_0.3.1        rmarkdown_1.12    
##   rstudioapi_0.9.0   servr_0.13         stats_3.5.2       
##   stringi_1.4.3      stringr_1.4.0      tibble_2.1.1      
##   tidyselect_0.2.5   tinytex_0.11       tools_3.5.2       
##   utf8_1.1.4         utils_3.5.2        wordcloud_2.6     
##   xfun_0.5           yaml_2.2.0