Data Analysis

 [Complete] Data Analysis : Sheet1

Business IntelligenceIt offers a  way to examine trends from collected data and derive insights from it.
48thOn an examination given to 1000 students, Jef’s score of 80 was higher than the score of 480 students who took the exam. What is the percentile for Jef’s score?
CorrelationIt refers to the degree of relationship between two variables?
likehoodTo estimate the parameters of the model ,the ________function is maximized.
FirthHe proposed the use of a penalized likehood function.
text miningIt
expands available data enormously.
rolesWhich is NOT a KR technology?
StandardThe normal distribution with a mean of 0 and standard deviation of 1.
network topology_ _A network purpoting to describe  family memberships.
nullAnother term for an empty set.
graphsThe
following are elements in an analytic plan EXCEPT
Medium for pragmatically diligent interpretationThe following are distinct roles that KR plays EXCEPT
histogramA graph that is used to indicate
frequency distribution.
MYCINIt sees the medical world as made of empirical associations connecting
symptoms to diseases.
hiddenThe constant multiplicative factor in which algorithms are related are_______ constants.
OneThe integral of all the values of a random variable in a probability density function is equal to______.
logistic regression_ _It refers to a frequently used method as it enables binary or polytomous
variables to be modelled.
RegressionThe equation of the _______line predicts the value of Y given X.
λThe symbol used to indicate strings with no elements.
invertibleMatrix B is
Cluster analysis_____________ includes identifying groups of data record.
relative frequency distributionIt
list the percent of data in a distribution.
multinomial legit modelA model that corresponds to the case where the dependent variable
has more than two categories.
SPSSThe following are softwares used in  data mining  EXCEPT
rule-basedIt views the world in terms of attribute -object value triples
chi-squareThe following are discrete distributions EXCEPT
Data miningIt is used to discover patterns in large data sets
space
complexity
It
relates the length of an algorithm to the number of storage location it uses.
poison probability distributionIt is  often used as a  model  of the number of arrivals
at a facility in a  given period of time.
9.38In the equation of the regression line represented by Y= 1.24 X + 6.9 if X=2 then Y =?
{I,M,S,P}Which
of the following is a set equal to the distinct letters of the word
"MISSISSIPI"?
Schwar’s Bayesian CriterionSBC means_________
Probability densityWhich function provides the value of a function at any particular value of x but does NOT directly give the probability of the random variable?
95What percent of data will lie within 2 standard deviation of the mean?
DJ PatilHe
coined the term  "data scientist"
{3,5,6,10,12}The range in  R={ (3,3), (3,6), (5,5),(5,10),(6.12)} is a binary
relation in R is
jointThe sets  A= { x/x is a distinct letter in the word
"MATHEMATICS"} and B={x/x is a distinct letter in the
word "STATISTICS"} , the two sets are
run time analysisIt is a theoretical classification that estimates and anticipates the increase increase in running time for algorithms.
Intelligent ReasoningIt is a variety of formal calculation typically deduction.
KnimeIt is a powerful tool that shows the network of data.
manipulate data efficiently and effectivelyWhat is the focus of ?
95What is the value of the mean if a score of 110 is 3 standard deviation above the mean?
1A
perfect positive correlation coefficient is equal to
Business IntelligenceIt transforms data into actionable intelligence
for business purposes.
R-programmingIt is a free software programming language.
12.25If the standard deviation of a distribution is 3.5, the variance is
JavaWhat programming language is used in Rapid miner?
ExpectedThe _______value is the weighted average of the value the random variable may assume.
x increases y decreasesA negative correlation exists when___________.
google mapExample
of a data product.
I and ivWhich pair belongs to the same family of models called GLM? i) logistic    ii) linear regression    iii.) multinomial regression     iv)probability
geometricThe following are continuous distributions EXCEPT
surrogateKR as a _________is a substitute for the thing itself.
MedianThe score NOT easily affected by extreme values.
normalA
bell-shaped distribution that is symmetric about a vertical line.
critical thinkingAccording
to Hilary Mason which is NOT a skill that a good data scientist must cultivate.
normalThe most commonly used continuous probability distribution.
worst caseThe function describing the performance of an algorithm is usually an upper bound determined from ______inputs.
logicIt involves a commitment in viewing the world in terms of individual
entities and relations between them.
analyticsData
is NOT information unless we add_________.
philosophyThe following provided inspirations of what constitute intelligent
reasoning EXCEPT
AWhich
of the matrices is singular?
poisson and binomialTwo of the most widely used discrete probability distribution.
5x 8What is the size of the product of a 5x 6 and a 6x 8 matrices?
INTERNISTIt sees a set of prototypes in particular to be matched to cases at hand
regressionWhich of the following is a predictive data mining technique?
Turing machineAn example of an abstract computer.
71-89What range of values lie between 3 standard deviations above and below the mean if the mean is 80 and the standard deviation is 3?
medianThe
middle-most value in a ranked list of numbers.
normalA bell-shaped distribution that is symmetric about a vertical line?
Pearson rWhich of the following is used as a method for Correlation?
multinomial logit modelA model that corresponds to the case where the dependent variable
has more than two categories.
cluster analysisIt includes identifying groups of data records
orangeit is  a perfect software  for machine learning.
Big beta notationThe following are large inputs EXCEPT
5The value of X in the regression equation Y= 1.24 X + 6.9 if Y=13.1 is
William GillasonWho
said that "The future is not  google-able " ?
bivariateData involving two variables.
data analysisThe process of inspecting,cleansing,transforming and modelling data with
the goal of discovering useful information.
Have same sizes.Addition
and subtraction of matrices only is possible if 
two are more matrices.
{3,5,6}If R= { (3,3), (3,6), (5,5),(5,10),(6.12)} is a binary relation in R
which the domain is
ontologicalKR is a set of __________commitments.
I and iiWhich pair belongs to the same family of models called GLM? i) logistic    ii) linear regression    iii.) multinomial regression     iv)probability
Data miningThe goal is to transform raw data into understandable business information.
Receiver Operating CharacteristicsROC means
logit modelThe most common function used to link probability to explanatory
variables.
PROBITThe most common functions used to link probability to the explanatory variables are the LOGIT model and ________model.
95A survey of 100 consumers said that the price charged for a kilo of rice could be approximated by a normal distribution with a mean of 35 and a standard deviation of 4.How many of them lie between 27 and 43?
data visualizationIt makes complex data more understandable and usable.
no modeA data having the same number of occurrence
in scores is said to be
Medium of human expression.It is a language that we say things about the world.
datalogyEarlier name for data science.
ModeThe number that occurs most frequently is called________.
52ndIf there are 103 scores the median is equal to the _____ranked score.
mean-50 s=5What is the value of the mean and standard deviation in a normal
probability density function?
data scientistHe
is someone who asks interesting questions on formal and informal theory.
dispersionAnother term for variability.
10A score of 50 lies 2 standard deviations above a mean of 30.What is the value of the standard deviation?
studioIt is used for prototyping in Rapid miner.
time
complexity
It
relates the length of an algorithm’s input to the number of steps it takes.
Rapid miner_____________ is rated as the number one business analytics software.
NormalThe most widely used continuous probability distribution.
unstructuredWhat type of text are processed in Text analytics?
84
vegetable distributor  knows that during the month of August ,the weights
of tomatoes are normally distributed with a mean of 0.61 lb  and a
standard deviation of 0.15 lb. What percent of the tomatoes weigh less than
0.71 lb?
2x3The product of a 2x5 and 5x3 matrices is a ______matrix
confusion matrixThe classification table that XLSTAT can display
rangeThe
difference between the highest and lowest value.
A +
B = B+ A
Which of the following is TRUE?
Higher than the meanA positive z-score means that the score is
7There are how many data mining techniques?
KRIt is used to enable an entity to determine consequences by thinking
rather than acting.
text miningAnother term for text analytics.
The correct answers are: Mean, Median, ModeWhich of the following is TRUE when a distribution is normal?
graphWhich is NOT a basic representation technologies?
random variableIt is a numerical function of the outcome of a statistical experiment.
computational complexity theoryis an important part of a broader_____________.
profile likehoodIt does NOT require the assumption that the parameters are normally distributed.
84A survey of 100 consumers said that the price charged for a kilo of rice could be approximated by a normal distribution with a mean of 35 and a standard deviation of 4.How many are less than 39?
data visualizationRefers to using tools of statistics to present data visually.
Chi-squareWhich of the following is a continuous distribution?
business intelligenceIt is used in organization’s strategic and tactical business decision making.
dataficationThe
quantification of data into information.
multinomial logit modelIt corresponds to the case where the dependent variable has more than 2 categories.
classificationWhich of the following data mining techniques is predictive?
velocityWhat
increases data volume?
Spearman rhoThe method of correlation used for ranked score is ________.
-14 -2
13 18
3A + B
KnimeIt is  popular among financial data analysts.
Knowledge RepresentationWhat is KR?
sequenceA special type of function where the domain is a  set of
consecutive integers.
λNull strings are indicated by
Run-time analysisIt  is a theoretical classification that
estimates and anticipates the increase in  (or run-
1.02Which is NOT a value of r ?
HypergeometricWhich of the following is a discrete distribution?
18In α =babaa  β  =a^6b^5bb, what is the
length of the concatenation of the two strings?
squareA matrix that has the same number of rows and columns is called
number of booksWhich is an example of a discrete random variable?
inferenceAny way to get new expressions from old ones.
0.206The area of the standard normal curve to the right of z=0.82 is _______.
{A,C,I,S,T}If A= { x/x is a distinct letter in the word "MATHEMATICS"} AND B={x/x is a distinct letter in the word "STATISTICS"} then their intersection is
probability density functionIt provides the height or the value of the function at any particular
value of x
150
vegetable distributor  knows that during the month of August ,the weights
of tomatoes are normally distributed with a mean of 0.61 lb  and a standard
deviation of 0.15 lb. How many can be expected to weigh more than 0.31 lb in a
shipment of 6000 tomatoes.
λThe
symbol used to indicate strings with no elements.
analysis
of algorithms
It
is a process  of finding the
computational complexity of algorithms.
it adheres to the functionWhich is NOT a component of KR?