The attached essay is the third milestone for my final project. I am looking for someone to revise it as my final project. I need someone who understands R (Rattle) to tweak the decision tree using the attached .csv dataset and then revise the essay according to the instructions doc.
The research for solving the question of which AP course a college bound student should
take math or science, analyzed the national numbers of exam participation for 11th and 12th grade
high school students. Because most high schools require four years of math and three years of
science with two years of lab, it was expected that AP math would have a higher number of
student exam participation. Moreover, it was expected that college bound high school students
would take an AP math exam before science because the top U.S. universities offering STEM
degrees require a minimum of four years advanced math as a prerequisite of admission.
Over the past 20 years, the percentage of high school students completing advanced
mathematics and science courses have substantially increased. The number of advanced math
courses such as Precalculus completed in high school rose from 13% in 1990 to 35% in 2009, and
the number of advanced science courses including Biology, Chemistry, and Physics rose from 19%
in 1990 to 30% in 2009 (Digest of Education Statistics, 2015, Table 225.40). NCES Digest of
Education Statistics (2015) reported more than 41,000 high schools including private in the
United States (Digest of Education Statistics, 2015, Table 214.10).
More and more high schools across the U.S. are utilizing the Advanced Placement (AP)
Program to advance curriculum with rigorous coursework emphasizing college preparation. For
the year 2015, The College Board reported that 21,953 U.S. high schools participate in the AP
program (The College Board, 2016). The Associated Press (2012) reported that 18 percent of
U.S. high school graduates passed at least one AP exam, up from 11 percent a decade ago.
The top-down decision tree depicted below was constructed using the R package Rattle.
It is a classification tree model specifically chosen for its algorithm that does the complex work on 2
its own requiring limited tweaking by the novice still learning the craft. The paragraphs following
Figure 1 Decision Tree AP Program Summary and Figure 2 Summary of the Decision Tree model
for Classification explain the structure of the tree presented in detail.
Figure 1: Decision Tree AP Program Summary
Summary of the Decision Tree model for Classification (built using 'rpart'): n= 72 node), split, n,
loss, yval, (yprob) * denotes terminal node
1) root 72 63 BIOLOGY (0.12 0.097 0.083 0.11 0.12 0.12 0.097 0.11 0.12)
2) X2015.Students.who.took.AP=118,707,152,745,22,789,302,532,52,678 36 28 CHEMISTRY (0
0.19 0.17 0.22 0 0 0.19 0.22 0)
3) X2015.Students.who.took.AP=171,074,195,526,20,533,223,479 36 27 BIOLOGY (0.25 0 0 0 0.25
0.25 0 0 0.25)
4) X2015.Students.who.took.AP=118,707,22,789,302,532 20 13 CALCULUS AB (0 0.35 0.3 0 0 0
0.35 0 0)
5) X2015.Students.who.took.AP=152,745,52,678 16 8 CHEMISTRY (0 0 0 0.5 0 0 0 0.5 0)
6) X2015.Students.who.took.AP=20,533,223,479 18 9 BIOLOGY (0.5 0 0 0 0 0.5 0 0 0)
7) X2015.Students.who.took.AP=171,074,195,526 18 9 PHYSICS 1 (0 0 0 0 0.5 0 0 0 0.5)
8) Mean.Score>=2.935 8 2 CALCULUS BC (0 0.12 0.75 0 0 0 0.12 0 0) *
9) Mean.Score< 2.935 12 6 CALCULUS AB (0 0.5 0 0 0 0 0.5 0 0) *
10) X2015.Students.who.took.AP=152,745 8 0 CHEMISTRY (0 0 0 1 0 0 0 0 0) *
11) X2015.Students.who.took.AP=52,678 8 0 PHYSICS C - MECH (0 0 0 0 0 0 0 1 0) *
12) X2015.Students.who.took.AP=223,479 9 0 BIOLOGY (1 0 0 0 0 0 0 0 0) *
13) X2015.Students.who.took.AP=20,533 9 0 PHYSICS 2 (0 0 0 0 0 1 0 0 0) *
14) X2015.Students.who.took.AP=171,074 9 0 PHYSICS 1 (0 0 0 0 1 0 0 0 0) *
15) X2015.Students.who.took.AP=195,526 9 0 STATISTICS (0 0 0 0 0 0 0 0 1) *
rpart(formula = AP.Math...Science.Courses ~ ., data = crs$dataset[crs$train, c(crs$input,
crs$target)], method = "class", parms = list(split = "information"), control = rpart.control(minsplit
= 8, minbucket = 8, usesurrogate = 0, maxsurrogate = 0))
Variables actually used in tree construction:
 Mean.Score X2015.Students.who.took.AP Root node error: 63/72 = 0.875 n= 72 CP nsplit rel error xerror xstd 1 0.138889
2 0.119048 0 1.00000 1.14286 0.000000
4 0.44444 0.82540 0.060327 3 0.079365
4 0.010000 6 0.20635 0.63492 0.066927
7 0.12698 0.42857 0.065205 Time taken: 0.09 secs Rattle timestamp: 2016-09-24 13:43:27 KEPAS Figure 2 Summary of the Decision Tree model for Classification 3
The model that has been built is a fairly large decision tree with seven nodes and eight leaf
nodes.The first node of the tree is Biology. The information provided tells us that the majority
class for the root node (the yval) is No. The 63 tells us how many of the 72 observations will be
incorrectly classifed as Yes, this is also known as the loss. 88% of the observations have the target
variable AP math and science courses as Yes and 12% of the observations have it as No. The
algorithm has chosen 2015 Students who took AP for the next split with a split value of 50/50 for
Chemistry and Biology. Node 2 uses the same variable 2015 Students who took AP to branch and
split nodes 4 and 5 that shows 28% took Calculus AB and 22% Chemistry.
The right side, Node 3 branches and splits to leaf nodes 6 and 7 showing 25% took
Biology and 25% took Physics I. The algorithm then chooses the mean score to split on Calculus
AB to leaf nodes 8 and 9 showing with 11% on Calculus BC and 17% Calculus AB. Node 5
Chemistry splits leaf nodes 10 Chemistry and 11 Physics C Mechanics 11/11. Node 6 Biology
splits leaf nodes 12 Biology and 13 Physics II 12%/12%. Finally, Node 7 splits leaf nodes14
Physics I and 16 Statistics 12%/12%.
What this means is that Biology was taken more than Chemistry, Physics I, Physics II and
Physics Mech. Biology was taken more than math. Calculus AB was taken more than Calculus BC
and Statistics. Based on the projections of this model, a college bound high school student given
the choice between taking an AP math or science course, would take a science course.
Process Documentation. The data set for this research, National Report was taken from the
College Board?s AP Program Participation and Performance Data 2015. The National Report is an
excel document with several worksheets comprising the following raw data (The College Board, 2016).
1. Number of AP exams taken by high school students listed by subject 4
2. Number of exams by subject for all participating high schools
3. Number of exams by subject accepted by colleges
4. Number of exam takers broken down with AP scores and mean by high school
grade, gender, race/ethnicity
The original data presented 36 subjects with all of the above breakdowns. The data chosen
by the algorithms on 36 subjects made it impossible to get useful results for making the research
decision. The data set was narrowed down to only the math and science subjects and data.
Another problem was the gender and race/ethnicity which made analysis difficult when the
algorithms chose to split on one of these variables. Those variables were not removed from the
dataset. Finally after modifying the partition default from 70/15/15 to 80/10/10 the algorithm
chose the ?number of students completing exams? variable to split on the AP subjects. The data
set was also saved from an Excel file to a comma-delimited.
Figure 3: The College Board?s National
Evaluation of Results. Although this analysis examined the data that included a wealth of
detailed information on the number of students that took AP math and science courses, we did not
have information on the number of high school students enrolled in STEM degree college
programs. Additionally, while the top 50 U.S. universities who offer STEM degrees were included
in the colleges accepting exams they were not identified separately in the data set and specific
prerequisite requirements were not obtained. Such information would have allowed more analysis
of whether college bound students interested in pursuing STEM degrees should take an AP math
or science exam. 6
The Associated Press. (2012, May 5). More students taking Advanced Placement classes, but test
pass rate remains about the same. Retrieved from
The College Board. (2016). AP Program Participation and Performance Data 2015 ? Research ?
The College Board. Retrieved from
The College Board. (2016). Number of schools offering AP exams (Rep.). Retrieved
U.S. Department of Education, National Center for Education Statistics. (2012, September).
Percentage of public and private high school graduates taking selected mathematics and
science courses in high school, by selected student and school characteristics: Selected
years, 1990 through 2009. Retrieved from
U.S. Department of Education, National Center for Education Statistics. (2015). Advanced
mathematics and science courses. Retrieved from https://nces.ed.gov/fastfacts
U.S. Department of Education, National Center for Education Statistics. (2016, January).
Number of public school districts and public and private elementary and secondary
schools: Selected years, 1869-70 through 2013-14. Retrieved from
Williams, G. J. (2013). Data mining with Rattle and R: The art of excavating data for knowledge
discovery. New York: Springer.
This question was answered on: Jan 30, 2021
Buy this answer for only: $15
This attachment is locked
We have a ready expert answer for this paper which you can use for in-depth understanding, research editing or paraphrasing. You can buy it or order for a fresh, original and plagiarism-free solution (Deadline assured. Flexible pricing. TurnItIn Report provided)
Pay using PayPal (No PayPal account Required) or your credit card . All your purchases are securely protected by .
About this QuestionSTATUS
Jan 30, 2021EXPERT
GET INSTANT HELP/h4>
We have top-notch tutors who can do your essay/homework for you at a reasonable cost and then you can simply use that essay as a template to build your own arguments.
You can also use these solutions:
- As a reference for in-depth understanding of the subject.
- As a source of ideas / reasoning for your own research (if properly referenced)
- For editing and paraphrasing (check your institution's definition of plagiarism and recommended paraphrase).
NEW ASSIGNMENT HELP?
Order New Solution. Quick Turnaround
Click on the button below in order to Order for a New, Original and High-Quality Essay Solutions. New orders are original solutions and precise to your writing instruction requirements. Place a New Order using the button below.
WE GUARANTEE, THAT YOUR PAPER WILL BE WRITTEN FROM SCRATCH AND WITHIN A DEADLINE.