Data Visualization and EDA Lab Problem 6

Spitfire 0 Точки

Data Visualization and EDA Lab Problem 6

Здравейте,

Идеята ми е да разкарам US и от новия dataframe да си дръпна max-a groupby по size за native_country.

income_data.native_country[income_data.groupby('native_country').size().max()]

=> резултат

'United-States'

new_income_data = income_data[~income_data.native_country.str.contains
('United-States')]

new_income_data.native_country[new_income_data.groupby('native_country').size().max()]

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-337-9414f069507f> in <module>
----> 1 new_income_data.native_country[new_income_data.groupby('native_country').size().max()]

~\Anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
    866         key = com.apply_if_callable(key, self)
    867         try:
--> 868             result = self.index.get_value(self, key)
    869 
    870             if not is_scalar(result):

~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
   4373         try:
   4374             return self._engine.get_value(s, k,
-> 4375                                           tz=getattr(series.dtype, 'tz', None))
   4376         except KeyError as e1:
   4377             if len(self) > 0 and (self.holds_integer() or self.is_boolean()):

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 643

Бих бил благодарен за предложения/насоки.

Тагове:

23/06/2019 19:23:29 Data Science

Виж всички отговори

nzihi 2 Точки

Привет, колега,

и аз я мислих тази задача известно време, но накрая стигнах до работещо решение:

def get_second_highest_num_people(dataframe):

num_people, country = dataframe['native_country'].value_counts()[dataframe['native_country'].value_counts != dataframe['native_country'].mode().any()], dataframe['native_country'].value_counts().keys().tolist()[1]

return num_people, country

23/06/2019 21:07:27

jbojilova 2 Точки

При мен проработи това:

# returns top 2 from the Series
top2 = dataframe.groupby("native-country")["native-country"].count().nlargest(2)

# get the 2nd row, t.e. Mexico, 643
country = top2.index[1]
num_people = top2.values[1]

23/06/2019 23:16:43

Spitfire 0 Точки

Благодаря, колега!

Ако имахме голям дейтасет или не можехме да правим нови дейтасетове това решение май е най-адекватно.

Но "Write a function to calculate and return the answer given a dataframe." това разбирам само като входящите данни да е даден dataframe.

Т.е. най-лесното за писане ми се струва отговора на колежката по-долу.

24/06/2019 09:52:31

Виж всички отговори