A Data Science Central Community
Benzene (C6H6) is the last fundamental chemical compound to have had is atomic structure uncovered. This discovery led the path to
Big innovation in combinatorial chemistry today is rooted in the discovery and understanding of complex, unusual, bizarre atomic structures such as Benzene, and the application of advanced analytic principles.
Let's start with a scratch course in chemistry (section 1). Then I'll explain how analytics helps create these incredibly powerful and useful new technologies (section 2).
1. Basic Chemistry Tutorial
Molecules are made of atoms. Atoms are the "prime number" entities that generate all the molecules. There are about 100 types of atoms in our universe, ordered by their atomic number: for a comprehensive list, check out the periodic table of elements. Molecules that contain at least two different types of atoms are called compounds.
Examples of atoms include H (Hydrogen), C (Carbon), O (Oxygen), Cl (Chlorine), Na (Sodium). Example of compounds include C6H6 (benzene), H2O (water), NaCl (salt), CH4 (methane).
With the 100 or so fundamental atoms, one can create an infinite number of molecules, and a finite number (say f(n)) of molecules with exactly n atoms. Note that benzene molecules have 12 atoms (6 carbon + 6 hydrogen), water have 3, salt have 2 and methane have 5. Also, the function f(n) grows incredibly fast, much faster than exponential. The possibilities for new molecule creations (that is, combination of atoms) are endless. The word combinatorial chemistry has been used in this context.
Organic chemistry is about molecules that contain H, O, C (in any quantity) and no other atoms. Not all combinations are permitted or stable, for instance HO does not exist, while H2O does.
What determines if a molecule exists or not is based on the number of electrons on the outermost layer of each of its individual atoms, and whether or not bonds can be created to (ideally) have the equivalent 8 electrons on the outermost layers by sharing electrons with adjacent atoms within the molecule..
Bonds can be single, double, triple or quadruple.
Examples of bonding: water (H2O) at the top, salt (NaCl) at the bottom
When Na (Sodium) and Cl (Chlore) bond together to form NaCl (Salt), the isolated Na electron on the outermost layer bond to the outermost layer of the Cl atom, and equilibrium (8 atoms on the outermost layer) is reached. Same with water.
2. The Analytic Path to Innovation
Now let's discuss the two applications introduced earlier.
A test for data science applicants
First, stop reading, and answer the following question: how can benzene be represented, given its formula C6H6, and the bonding constraints described above? Everything you need to answer this question is explained above. It is indeed a difficult question - the kind that Google would love to ask to future hires, and it took decades before a solution was eventually found. (I provided the answer in the picture at the bottom; a line segment represents a single bond; a double line segment represents a double bond)
2-D organic molecules
Drug companies have been among the first to realize that it could make sense to create a catalog of all the potential molecules of at most n atoms, each atom being either H, C or O. These millions of potential molecules can be easily clustered (using statistical clustering techniques) and their medicinal properties can be guessed even before the first one is manufactured, or even if it can't be manufactured. Currently, about 100,000 such molecules are created each year, with the help of computer simulation.
3-D carbon molecules
What if instead of using H, O and C, you use just C alone? Can you create C2, C3, C4, and so forth? Which C's can you create, which ones can not exist?
Turns out that this is a very difficult question. Instead of creating planar molecules, you must consider molecules with a 3-dimensional atomic structure. The first to be discovered was buckminsterfullerene (C60) in 1985. The atomic structure (sphere) looks like a soccer ball. I'm not sure it has any value, but it led to the discovery of a famous class of (cylinder) carbon molecules called nanotubes, the most notorious being C74. These molucles are about to create a new industribal revolution, with the creation of incredibly strong light (one atom thick!!) cables with incredible thermal and electrical properties.