코딩고치

[파이썬][데이터 사이언스 기초] DataFrame 인덱싱 본문

파이썬/데이터 사이언스 기초

[파이썬][데이터 사이언스 기초] DataFrame 인덱싱

코딩고치 2020. 6. 2. 05:47

DataFrame 인덱싱

import pandas as pd
miracle_df = pd.read_csv('test1.csv', index_col=0)
miracle_df
  FP_cost Slots_used Faith_Required
Way of White Corona 15 1 18
Projected Heal 55 1 28
Lighting Arrow 19 1 35
Heal Aid 27 1 8
Soothing Sunlight 80 1 45
Replenishment 30 1 15
  • 원하는 데이터 출력
    • miracle_df.loc['출력하고자 하는 데이터', '해당 정보 (column 이름)']
miracle_df.loc['Way of White Corona', 'FP_cost']
15
  • 한 줄 통째로 출력
    • miracle_df.loc['출력하고자 하는 데이터', :] or miracle_df.loc['출력하고자 하는 데이터']
miracle_df.loc['Way of White Corona', :]
FP_cost           15
Slots_used         1
Faith_Required    18
Name: Way of White Corona, dtype: int64
miracle_df.loc['Way of White Corona']
FP_cost           15
Slots_used         1
Faith_Required    18
Name: Way of White Corona, dtype: int64
type(miracle_df.loc['Way of White Corona'])
pandas.core.series.Series
  • column을 통째로 출력
    • miracle_df.loc[:, column] or miracle_df[column]
miracle_df.loc[:, 'Faith_Required']
Way of White Corona    18
Projected Heal         28
Lighting Arrow         35
Heal Aid                8
Soothing Sunlight      45
Replenishment          15
Name: Faith_Required, dtype: int64
miracle_df['Faith_Required']
Way of White Corona    18
Projected Heal         28
Lighting Arrow         35
Heal Aid                8
Soothing Sunlight      45
Replenishment          15
Name: Faith_Required, dtype: int64
type(miracle_df['Faith_Required'])
pandas.core.series.Series
  • column 2개 출력
    • miracle_df[['column', 'column']]
miracle_df[['FP_cost', 'Slots_used']]
  FP_cost Slots_used
Way of White Corona 15 1
Projected Heal 55 1
Lighting Arrow 19 1
Heal Aid 27 1
Soothing Sunlight 80 1
Replenishment 30 1
  • row 2개 출력
    • miracle_df.loc[['row', 'row']]
  • 2차원이기 때문에 type이 series가 아니라 DataFrame
miracle_df.loc[['Way of White Corona', 'Projected Heal']]
  FP_cost Slots_used Faith_Required
Way of White Corona 15 1 18
Projected Heal 55 1 28
type(miracle_df.loc[['Way of White Corona', 'Projected Heal']])
pandas.core.frame.DataFrame
  • 슬라이싱
    • miracle_df.loc['data1':'data2']
      • data1부터 data2까지 모두 출력
    • miracle_df.loc[:'data']
      • 처음 데이터부터 data까지 출력
miracle_df.loc['Projected Heal':'Soothing Sunlight']
  FP_cost Slots_used Faith_Required
Projected Heal 55 1 28
Lighting Arrow 19 1 35
Heal Aid 27 1 8
Soothing Sunlight 80 1 45
miracle_df.loc[:'Soothing Sunlight']
  FP_cost Slots_used Faith_Required
Way of White Corona 15 1 18
Projected Heal 55 1 28
Lighting Arrow 19 1 35
Heal Aid 27 1 8
Soothing Sunlight 80 1 45
  • column의 경우 miracle_df.loc[:, 'data1':'data2']로 입력해야 함
    • 앞에 ':'는 모든 row에 대해서 column을 받아오라는 의미임
miracle_df.loc[:, 'FP_cost':'Slots_used']
  FP_cost Slots_used
Way of White Corona 15 1
Projected Heal 55 1
Lighting Arrow 19 1
Heal Aid 27 1
Soothing Sunlight 80 1
Replenishment 30 1
miracle_df.loc['Projected Heal':'Heal Aid', 'FP_cost':'Slots_used']
  FP_cost Slots_used
Projected Heal 55 1
Lighting Arrow 19 1
Heal Aid 27 1
  • 불린으로 인덱싱
  • True만 출력됨
miracle_df.loc[[True, True, False, True, False, False]]
  FP_cost Slots_used Faith_Required
Way of White Corona 15 1 18
Projected Heal 55 1 28
Heal Aid 27 1 8
  • 데이터 수 보다 더 적은 수의 불린을 입력하면 에러 발생
miracle_df.loc[[True, False, True]]
---------------------------------------------------------------------------

IndexError                                Traceback (most recent call last)

<ipython-input-76-90b9654c7f89> in <module>
----> 1 miracle_df.loc[[True, False, True]]


~\anaconda3\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
   1765 
   1766             maybe_callable = com.apply_if_callable(key, self.obj)
-> 1767             return self._getitem_axis(maybe_callable, axis=axis)
   1768 
   1769     def _is_scalar_access(self, key: Tuple):


~\anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis)
   1911             return self._get_slice_axis(key, axis=axis)
   1912         elif com.is_bool_indexer(key):
-> 1913             return self._getbool_axis(key, axis=axis)
   1914         elif is_list_like_indexer(key):
   1915 


~\anaconda3\lib\site-packages\pandas\core\indexing.py in _getbool_axis(self, key, axis)
   1779         # caller is responsible for ensuring non-None axis
   1780         labels = self.obj._get_axis(axis)
-> 1781         key = check_bool_indexer(labels, key)
   1782         inds = key.nonzero()[0]
   1783         return self.obj._take_with_is_copy(inds, axis=axis)


~\anaconda3\lib\site-packages\pandas\core\indexing.py in check_bool_indexer(index, key)
   2323         # key might be sparse / object-dtype bool, check_array_indexer needs bool array
   2324         result = np.asarray(result, dtype=bool)
-> 2325         result = check_array_indexer(index, result)
   2326 
   2327     return result


~\anaconda3\lib\site-packages\pandas\core\indexers.py in check_array_indexer(array, indexer)
    401         if len(indexer) != len(array):
    402             raise IndexError(
--> 403                 f"Boolean index has wrong length: "
    404                 f"{len(indexer)} instead of {len(array)}"
    405             )


IndexError: Boolean index has wrong length: 3 instead of 6
miracle_df.loc[:, [True, False, True]]
  FP_cost Faith_Required
Way of White Corona 15 18
Projected Heal 55 28
Lighting Arrow 19 35
Heal Aid 27 8
Soothing Sunlight 80 45
Replenishment 30 15
  • 필터링하기 위해 불린을 받아오는 것은 numpy와 같음
miracle_df['FP_cost'] > 30
Way of White Corona    False
Projected Heal          True
Lighting Arrow         False
Heal Aid               False
Soothing Sunlight       True
Replenishment          False
Name: FP_cost, dtype: bool
miracle_df.loc[miracle_df['FP_cost'] > 30]
  FP_cost Slots_used Faith_Required
Projected Heal 55 1 28
Soothing Sunlight 80 1 45
  • 2가지 조건 받아오기
(miracle_df['FP_cost'] > 30) & (miracle_df['Faith_Required'] > 20)
Way of White Corona    False
Projected Heal          True
Lighting Arrow         False
Heal Aid               False
Soothing Sunlight       True
Replenishment          False
dtype: bool
miracle_df.loc[(miracle_df['FP_cost'] > 30) & (miracle_df['Faith_Required'] > 20)]
  FP_cost Slots_used Faith_Required
Projected Heal 55 1 28
Soothing Sunlight 80 1 45
(miracle_df['FP_cost'] > 30) | (miracle_df['Faith_Required'] > 20)
Way of White Corona    False
Projected Heal          True
Lighting Arrow          True
Heal Aid               False
Soothing Sunlight       True
Replenishment          False
dtype: bool
miracle_df.loc[(miracle_df['FP_cost'] > 30) | (miracle_df['Faith_Required'] > 20)]
  FP_cost Slots_used Faith_Required
Projected Heal 55 1 28
Lighting Arrow 19 1 35
Soothing Sunlight 80 1 45
  • 숫자로 인덱싱 하기
    • miracle_df.iloc[2, 1]
      • 2번 row의 1번 column 출력
miracle_df.iloc[2, 1]
1
miracle_df.iloc[[1, 2], [1, 0]]
  Slots_used FP_cost
Projected Heal 1 55
Lighting Arrow 1 19
miracle_df.iloc[1:, 0:2]
  FP_cost Slots_used
Projected Heal 55 1
Lighting Arrow 19 1
Heal Aid 27 1
Soothing Sunlight 80 1
Replenishment 30 1
Comments