Hello, I have been trying to analyze sleep heart health study1 and 2 data on Jupyter notebook using Python kernel. For some reason, when I use the command below, I get the following error. I looked online and noted that it might be something to do with data uploaded in mac to the opening in windows. Should I use Ruby to download the csv file instead?
UnicodeDecodeError Traceback (most recent call last) ~\AppData\Local\Temp\1\ipykernel_19908\3877291478.py in <module> ----> 1 dfSH2=pd.read_csv("shhs2_dataset.csv")
~\Anaconda3\lib\site-packages\pandas\util_decorators.py in wrapper(*args, **kwargs) 309 stacklevel=stacklevel, 310 ) --> 311 return func(*args, **kwargs) 312 313 return wrapper
~\Anaconda3\lib\site-packages\pandas\io\parsers\readers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options) 676 kwds.update(kwds_defaults) 677 --> 678 return _read(filepath_or_buffer, kwds) 679 680
~\Anaconda3\lib\site-packages\pandas\io\parsers\readers.py in _read(filepath_or_buffer, kwds) 579 580 with parser: --> 581 return parser.read(nrows) 582 583
~\Anaconda3\lib\site-packages\pandas\io\parsers\readers.py in read(self, nrows) 1251 nrows = validate_integer("nrows", nrows) 1252 try: -> 1253 index, columns, col_dict = self._engine.read(nrows) 1254 except Exception: 1255 self.close()
~\Anaconda3\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py in read(self, nrows) 223 try: 224 if self.low_memory: --> 225 chunks = self._reader.read_low_memory(nrows) 226 # destructive to chunks 227 data = _concatenate_chunks(chunks)
~\Anaconda3\lib\site-packages\pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader.read_low_memory()
~\Anaconda3\lib\site-packages\pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._read_rows()
~\Anaconda3\lib\site-packages\pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()
~\Anaconda3\lib\site-packages\pandas_libs\parsers.pyx in pandas._libs.parsers.raise_parser_error()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 42968: invalid start byte
Thanks for using the site. You don't need Ruby to download the CSV datasets - it's more useful when downloading thousands of EDF and annotation files at once.
From what I can tell this is likely an encoding issue, perhaps you need to lookup our CSV dataset's encoding and specify that in the read_csv command.
I found some links that might be helpful:
https://stackoverflow.com/questions/18171739/unicodedecodeerror-when-reading-csv-file-in-pandas-with-python https://www.kaggle.com/code/paultimothymooney/how-to-resolve-a-unicodedecodeerror-for-a-csv-file