campLabSpace/ datedHumanGenome/ contains a file for each node of the tree which has human regions that are associated with that node. allSpecies.bed were all of our alignable regions in human referenced alignments (this was the input file for our calculation of ancestor for human) ** this now includes zebrafish**.--this is different from the previous one that is in runOne/ speciesNode.list is different from the one in runOne/. This file specifies how each species is related to human for human referenced alignments to run on calculateMrca.csh campTree.png holds the image of the tree file with the labelled nodes that correspond to the speciesNode.list file campLabTree.nh contains the newick file of the new tree campDist.txt relative distances to each pairwise comparison in the tree speciesTranslate.txt contains the common names of every species in the tree and the corresponding assembly name used for the pipeline preProcessGREAT.csh this code processes the raw GREAT output into something useful for GO Biological Process data. Alter as necessary to create any kind of subset of the GREAT data. greatOutput/ this directory has the raw GREAT output for each node. New node info: node0= human/Chimp/bonobo ancestor node1= human/gorilla ancestor node2= human/crab eating macaque ancestor node3= human/marmoset ancestor node4= human/mouse ancestor node5= human/cow/dog ancestor node6= placental mammal ancestor node7= human/opossum ancestor node8= human/platypus ancestor node9= human/bird/alligator/turtle ancestor node10= tetrapod ancestor (now human/frog) node11= human/ceolacanth ancestor node12= human/fish ancestor node13= vertebrate ancestor Table_human_organoid_unified_peaks_hg38.bed–original data I was sent hg38OrganoidAtac.campLab.4col.bed—4 column version of the original calculateMrca.csh--most updated but not final this is the program that does the bulk of the work for dating our regions. It will check the data file to filter out any elements that are in areas of the target genome that weren't alignable for any species. Then it will compare them to the eldest node species to determine if anything is alingable in those ancestors before writing those all to whatever number was assigned to that node (this is done by hand and is given to this function with the speciesNode.list file in runOne/). punchHolesInBed.pl this is a perl script that requires the bed inputs (allSpecies.bed and whatever data its being compared to) to have 4 columns that are all unique. It will error if that isn't true (typically it'll say something like this if that's the problem: "chroms don't match:" followed by the line info that it failed on.) To make things unique we typically will just add the line number that it appears on. The things on here that need to look like this already do, but I'll write down a couple ways to use for other things. You can do that a few ways, I like to use awk: "cat | cut -f1-4 | awk '{print $0"."NR}' > " you can also use perl: "cat | cut -f1-4 | perl -ne 'chomp($_);$x+=1;print("$_.$x\n");' > " as a side note, when you copy files on the terminal they tend to translate tabs to spaces, so if that happens you can use this to correct it: "cat | tr " " "\t" > " alignments/ this directory contains all genome.genome alignments with the target/reference listed first. runOne/ allSpecies.bed were all of our alignable regions (this was the input file for our calculation of ancestor). speciesNode.list contains all the species in the alignment and which node of the tree (ancestor) they belong to (their common ancestor to the reference species). treeShrew.campLabTree.png this has an image of the original tree for the first run with all labelled nodes. treeShrew.trimmedForCampLab.nh newick file for tree from the original run with treeShrew treeShrew.campDist.txt has relative branch distances of each pairwise set in the whole tree from the first run These files are the output of our calculation of the ancestor unaltered. node0.bed node1.bed node2.bed node3.bed node4.bed node5.bed node6.bed node7.bed node8.bed node9.bed Tree file with the nodes labelled with the corresponding number of their ancestor for the above files: campLabTree.png node0 = HumanChimpBonoboAncestor (referred to as HumanChimpAncestor for brevity) node1 = HumanGorillaAncestor node2 = HumanMacaqueAncestor node3 = HumanMarmosetAncestor (primate ancestor) node4 = HumanTreeShrewAncestor (had no regions assigned, this node will be reworked in future) node5 = HumanMouseAncestor node6 = MammalianAncestor node7 = TetrapodAncestor (this is the node with 75% of the regions assigned) node8 = HumanFishAncestor node9 = VertebrateAncestor trimmedToATACData/ the items in this directory are the bed files from my output for each node. There is no data in this directory from GREAT for node7 or node4. Node7 was too large for the background and node4 had no data assigned. They have been altered so they could be run through GREAT with the background set to the ATAC-seq data I was given (Table_human_organoid_unified_peaks_hg38.bed). They needed to be run through kentutils overlapSelect in order to get the coordinates to match exactly with the original file so that GREAT could recognize them. They have not substantively changed. trimmedToATACData/GreatResults/ this directory contains all the files made form the trimmed data in the parent directory. Node7 was too large for the background and node4 had no data assigned. greatWholeGenomeResults/ these files are only for nodes 6-9, with large enough numbers of assigned regions to calculate enrichments for the whole genome rather than the original file as a background. All processed data and GREAT raw data are here. figures/ this directory has all the figures I've shown to the group. Data was generated with R ggpubr package ggballoonplot function. 0_5top20Balloon.pdf (source file 0_5.top20.tsv) These data are the node0-node5 files run on GREAT with the background set to the ATAC-seq data that was sent to me (Table_human_organoid_unified_peaks_hg38.bed). I then processed the output GREAT files (labelled here as nodeXGREATCampLab.tsv) for the items were wanted to display (GO Biological Process) with preProcessGREAT.csh (this script if a work in progress) and sorted for Rank to get the top 20 hits. 6_9top20Balloon.pdf (source file 6_9.top20.tsv) These data are the node6-9 files run on GREAT with the background. Otherwise, I followed the same technique as in the 0_5top20 file. allDataBalloon.pdf (source file figureInput.tsv) These data are nodes 0-3 5-6 and 8-9 all run on the background of the ATAC-seq data I was given (Table_human_organoid_unified_peaks_hg38.bed). (node 4 had no assignments, node 7 had too many hits to assign in the limited ATAC-seq background so they are not included) The data were processed with the preProcessedGREAT.csh code and then used to make the figure. allDataVolcano.pdf (source file figureInput.tsv) same processing as allDataBalloon, but made with ggscatter instead.