Subscribe to DSC Newsletter

Analytics{Benzene} => {big Pharma, Nanotechnologies}

Benzene (C6H6) is the last fundamental chemical compound to have had is atomic structure uncovered. This discovery led the path to

  • Creating large numbers of complex synthetic molecules using computer simulations and deep mathematics (which in turn led to the explosion of synthetic drugs in big pharma)
  • Creating very useful but weird carbon molecules using computer simulations and deep mathematics (wich in turn led to the explosion of nanotechnologies)

Big innovation in combinatorial chemistry today is rooted in the discovery and understanding of complex, unusual, bizarre atomic structures such as Benzene, and the application of advanced analytic principles.

Let's start with a scratch course in chemistry (section 1). Then I'll explain how analytics helps create these incredibly powerful and useful new technologies (section 2).

1. Basic Chemistry Tutorial

Molecules are made of atoms. Atoms are the "prime number" entities that generate all the molecules. There are about 100 types of atoms in our universe, ordered by their atomic number: for a comprehensive list, check out the periodic table of elements. Molecules that contain at least two different types of atoms are called compounds.

Examples of atoms include H (Hydrogen), C (Carbon), O (Oxygen), Cl (Chlorine), Na (Sodium). Example of compounds include C6H6 (benzene), H2O (water), NaCl (salt), CH4 (methane).

With the 100 or so fundamental atoms, one can create an infinite number of molecules, and a finite number (say f(n)) of molecules with exactly n atoms. Note that benzene molecules have 12 atoms (6 carbon + 6 hydrogen), water have 3, salt have 2 and methane have 5. Also, the function f(n) grows incredibly fast, much faster than exponential. The possibilities for new molecule creations (that is, combination of atoms) are endless. The word combinatorial chemistry has been used in this context.

Organic chemistry is about molecules that contain H, O, C (in any quantity) and no other atoms. Not all combinations are permitted or stable, for instance HO does not exist, while H2O does.


What determines if a molecule exists or not is based on the number of electrons on the outermost layer of each of its individual atoms, and whether or not bonds can be created to (ideally) have the equivalent 8 electrons on the outermost layers by sharing electrons with adjacent atoms within the molecule.. 

Bonds can be single, double, triple or quadruple. 

  • Hydrogen: has 1 electron on outermost layer; needs single bonds to associate with other atoms
  • Oxygen: 6 electrons on outermost layer; needs double bonds (or less)
  • Carbon: 4 electrons on outermost layers;  needs six quadruple bonds (or less)

Examples of bonding: water (H2O) at the top, salt (NaCl) at the bottom 

Oxygen has single covalent bonding with each of the two Hydrogen atoms       Sodium lets Chlorine use its valance electron

When Na (Sodium) and Cl (Chlore) bond together to form NaCl (Salt), the isolated Na electron on the outermost layer bond to the outermost layer of the Cl atom, and equilibrium (8 atoms on the outermost layer) is reached. Same with water.

2. The Analytic Path to Innovation

Now let's discuss the two applications introduced earlier.

A test for data science applicants

First, stop reading, and answer the following question: how can benzene be represented, given its formula C6H6, and the bonding constraints described above? Everything you need to answer this question is explained above. It is indeed a difficult question - the kind that Google would love to ask to future hires, and it took decades before a solution was eventually found. (I provided the answer in the picture at the bottom; a line segment represents a single bond; a double line segment represents a double bond)

2-D organic molecules

Drug companies have been among the first to realize that it could make sense to create a catalog of all the potential molecules of at most n atoms, each atom being either H, C or O. These millions of potential molecules can be easily clustered (using statistical clustering techniques) and their medicinal properties can be guessed even before the first one is manufactured, or even if it can't be manufactured. Currently, about 100,000 such molecules are created each year, with the help of computer simulation. 

3-D carbon molecules

What if instead of using H, O and C, you use just C alone? Can you create C2, C3, C4, and so forth? Which C's can you create, which ones can not exist?

Turns out that this is a very difficult question. Instead of creating planar molecules, you must consider molecules with a 3-dimensional atomic structure. The first to be discovered was buckminsterfullerene (C60) in 1985. The atomic structure (sphere) looks like a soccer ball. I'm not sure it has any value, but it led to the discovery of a famous class of (cylinder) carbon molecules called nanotubes, the most notorious being C74. These molucles are about to create a new industribal revolution, with the creation of incredibly strong light (one atom thick!!) cables with incredible thermal and electrical properties.

Dodecahedron t12 v.png

Buckminsterfullerene (C60)

Nanotubes (C74)

Benzene (C6H6)

Related articles:

Views: 2107


You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

Comment by Mike Griffiths on February 11, 2013 at 8:38am

Of the beauty of combinatorial mathematics! Coupled with what we already know about similarly clustered compounds, as you've stated above, a pathway to chemical behavior is possible. If only human beings behaved in such an ordered fashion, now that would be useful.

Comment by Vincent Granville on February 10, 2013 at 3:30pm

Hi Gary - I meant that C6H6's 2-dimensional atomic structure was really understood about 100 years ago, which in my opinion is very recent, given the fact that far more complex molecules (e.g. alcohol) were understood far earlier.  

Could be also the fact that my sentence would be correct in French, but maybe not in English, as tenses have very, very subtle (conflicting) differences between both languages, and some of these English subtleties are still mysterious to me.

Comment by Gary D. Miner, Ph.D. on February 10, 2013 at 3:17pm

I'm wondering, Vincent, if you made a "typo" in your first sentence? {as the "tense' of the first sentence implies that the structure of benzene was "just recently discovered"; but I learned this in 1964 in my ORGANIC CHEMISTRY class.......}:


"Benzene (C6H6) is the last fundamental chemical compound to have had is atomic structure uncovered....}Did you mean to write "was the last ......"?

I read through the entire essay to see if for some reason I was not remembering correctly, but when I got to your chemical structure of benzene it was exactly as I had "memorized it 3-dimensionally" almost 50 years I hallucinating ?




On Data Science Central

© 2020   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service