Intro to Data Science : Problem Set 3
Intro to Data Science Online Course - Udacity
Analyzing Subway Data
- pandas + histogram
- cheating
plt.figure() turnstile_weather['ENTRIESn_hourly'][turnstile_weather['rain']==1].hist(bins=20, alpha = 0.8) # your code here to plot a historgram for hourly entries when it is raining turnstile_weather['ENTRIESn_hourly'][turnstile_weather['rain']==0].hist(bins=20, alpha = 0.3) # your code here to plot a historgram for hourly entries when it is not raining return plt
- Welch's t test - Wikipedia, the free encyclopedia
- ウェルチのt検定 - Wikipedia
- 等分散ではない可能性のある2つの標本に用いることが意図されたスチューデントのt検定の改良型
ここでクレジットカードの登録をさせられる
Free期間は2週間後 188$/month ?
- Mann-Whitney rank test
with_rain = turnstile_weather['ENTRIESn_hourly'][turnstile_weather['rain'] == 1] without_rain = turnstile_weather['ENTRIESn_hourly'][turnstile_weather['rain'] == 0] with_rain_mean = np.mean(with_rain) without_rain_mean = np.mean(without_rain) U, p = scipy.stats.mannwhitneyu(with_rain, without_rain)
Output
Here's your output: (1105.4463767458733, 1090.278780151855, 1924409167.0, 0.024999912793489721)
P値
compute_cost() and gradient_descent()
- Code for compute_cost & gradient_descent - Udacity Forums
- r2 value is 0.461129068126
residuals 残差
R2 決定係数
- 独立変数(説明変数)が従属変数(被説明変数)のどれくらいを説明できるかを表す値
- 決定係数 - Wikipedia
r_squared = 1 - ( np.sum(np.square(data - predictions)) )/( np.sum(np.square(data - np.mean(data))) )
Your calculated R2 value is: 0.318137233709
Ordinary least squares
predictions
- udacity/project3-6.py at master · upjohnc/udacity · GitHub
- Your R2 value is: 0.55481832974
- [***SPOILER***] Why my solution is getting a R² of 0.096? - Udacity Forums
- udacity/project3-6.py at master · upjohnc/udacity · GitHub
1.5h