campLabSpace/

datedHumanGenome/
contains a file for each node of the tree which has human regions that are associated with that node. 

allSpecies.bed were all of our alignable regions in human referenced alignments (this was the input file for our calculation of ancestor for human) ** this now includes zebrafish**.--this is different from the previous one that is in runOne/

speciesNode.list is different from the one in runOne/. This file specifies how each species is related to human for human referenced alignments to run on calculateMrca.csh

campTree.png holds the image of the tree file with the labelled nodes that correspond to the speciesNode.list file

campLabTree.nh
contains the newick file of the new tree

campDist.txt
relative distances to each pairwise comparison in the tree

speciesTranslate.txt
contains the common names of every species in the tree and the corresponding assembly name used for the pipeline

preProcessGREAT.csh 
this code processes the raw GREAT output into something useful for GO Biological Process data. Alter as necessary to create any kind of subset of the GREAT data.

greatOutput/
this directory has the raw GREAT output for each node.

New node info:

node0= human/Chimp/bonobo ancestor
node1= human/gorilla ancestor
node2= human/crab eating macaque ancestor
node3= human/marmoset ancestor
node4= human/mouse ancestor
node5= human/cow/dog ancestor
node6= placental mammal ancestor
node7= human/opossum ancestor
node8= human/platypus ancestor
node9= human/bird/alligator/turtle ancestor
node10= tetrapod ancestor (now human/frog)
node11= human/ceolacanth ancestor
node12= human/fish ancestor
node13= vertebrate ancestor 

Table_human_organoid_unified_peaks_hg38.bed–original data I was sent
hg38OrganoidAtac.campLab.4col.bed—4 column version of the original

calculateMrca.csh--most updated but not final
this is the program that does the bulk of the work for dating our regions. It will check the data file to filter out any elements that are in areas of the target genome that weren't alignable for any species. Then it will compare them to the eldest node species to determine if anything is alingable in those ancestors before writing those all to whatever number was assigned to that node (this is done by hand and is given to this function with the speciesNode.list file in runOne/).

punchHolesInBed.pl
this is a perl script that requires the bed inputs (allSpecies.bed and whatever data its being compared to) to have 4 columns that are all unique. It will error if that isn't true (typically it'll say something like this if that's the problem: "chroms don't match:" followed by the line info that it failed on.) To make things unique we typically will just add the line number that it appears on. The things on here that need to look like this already do, but I'll write down a couple ways to use for other things. You can do that a few ways, I like to use awk:

"cat <file.bed> | cut -f1-4 | awk '{print $0"."NR}' > <file.4colUnique.bed>"

you can also use perl:

"cat <file.bed> | cut -f1-4 | perl -ne 'chomp($_);$x+=1;print("$_.$x\n");' > <file.4colUnique.bed>"

as a side note, when you copy files on the terminal they tend to translate tabs to spaces, so if that happens you can use this to correct it:

"cat <file> | tr " " "\t" > <newFile>"

alignments/

this directory contains all genome.genome alignments with the target/reference listed first.

runOne/

allSpecies.bed were all of our alignable regions (this was the input file for our calculation of ancestor).

speciesNode.list
contains all the species in the alignment and which node of the tree (ancestor) they belong to (their common ancestor to the reference species).

treeShrew.campLabTree.png
this has an image of the original tree for the first run with all labelled nodes.

treeShrew.trimmedForCampLab.nh
newick file for tree from the original run with treeShrew

treeShrew.campDist.txt
has relative branch distances of each pairwise set in the whole tree from the first run

These files are the output of our calculation of the ancestor unaltered.
node0.bed
node1.bed
node2.bed
node3.bed
node4.bed
node5.bed
node6.bed
node7.bed
node8.bed
node9.bed

Tree file with the nodes labelled with the corresponding number of their ancestor for the above files:
campLabTree.png

node0 = HumanChimpBonoboAncestor (referred to as HumanChimpAncestor for brevity)
node1 = HumanGorillaAncestor
node2 = HumanMacaqueAncestor
node3 = HumanMarmosetAncestor (primate ancestor)
node4 = HumanTreeShrewAncestor (had no regions assigned, this node will be reworked in future)
node5 = HumanMouseAncestor
node6 = MammalianAncestor
node7 = TetrapodAncestor (this is the node with 75% of the regions assigned)
node8 = HumanFishAncestor
node9 = VertebrateAncestor

trimmedToATACData/
	the items in this directory are the bed files from my output for each node. There is no data in this directory from GREAT for node7 or node4. Node7 was too large for the background and node4 had no data assigned. They have been altered so they could be run through GREAT with the background set to the ATAC-seq data I was given (Table_human_organoid_unified_peaks_hg38.bed). They needed to be run through kentutils overlapSelect in order to get the coordinates to match exactly with the original file so that GREAT could recognize them. They have not substantively changed. 

trimmedToATACData/GreatResults/
this directory contains all the files made form the trimmed data in the parent directory. Node7 was too large for the background and node4 had no data assigned.

greatWholeGenomeResults/
	these files are only for nodes 6-9, with large enough numbers of assigned regions to calculate enrichments for the whole genome rather than the original file as a background. All processed data and GREAT raw data are here.

figures/
	this directory has all the figures I've shown to the group.

Data was generated with R ggpubr package ggballoonplot function.

0_5top20Balloon.pdf (source file 0_5.top20.tsv)
	These data are the node0-node5 files run on GREAT with the background set to the ATAC-seq data that was sent to me (Table_human_organoid_unified_peaks_hg38.bed). I then processed the output GREAT files (labelled here as nodeXGREATCampLab.tsv) for the items were wanted to display (GO Biological Process) with preProcessGREAT.csh (this script if a work in progress) and sorted for Rank to get the top 20 hits.

6_9top20Balloon.pdf (source file 6_9.top20.tsv)
	These data are the node6-9 files run on GREAT with the background. Otherwise, I followed the same technique as in the 0_5top20 file.

allDataBalloon.pdf (source file figureInput.tsv)
	These data are nodes 0-3 5-6 and 8-9 all run on the background of the ATAC-seq data I was given (Table_human_organoid_unified_peaks_hg38.bed). (node 4 had no assignments, node 7 had too many hits to assign in the limited ATAC-seq background so they are not included) The data were processed with the preProcessedGREAT.csh code and then used to make the figure.

allDataVolcano.pdf (source file figureInput.tsv)
	same processing as allDataBalloon, but made with ggscatter instead.