Science Memes

11148 readers

2143 users here now

Welcome to c/science_memes @ Mander.xyz!

A place for majestic STEMLORD peacocking, as well as memes about the realities of working in a lab.

Rules

Don't throw mud. Behave like an intellectual and remember the human.
Keep it rooted (on topic).
No spam.
Infographics welcome, get schooled.

This is a science community. We use the Dawkins definition of meme.

Research Committee

!spiders@lemmy.world

Other Mander Communities

Science and Research

Biology and Life Sciences

Physical Sciences

Humanities and Social Sciences

Practical and Applied Sciences

Memes

Miscellaneous

founded 2 years ago

MODERATORS

Sal@mander.xyz

fossilesque@mander.xyz

SciBot@mander.xyz

343

Pandas (mander.xyz)

submitted 3 months ago by fossilesque@mander.xyz to c/science_memes@mander.xyz

36 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] tequinhu@lemmy.world 10 points 3 months ago (1 children)

It really depends on the machine that is running the code. Pandas will always have the entire thing loaded in memory, and while 600Mb is not a concern for our modern laptops running a single analysis at a time, it can get really messy if the person is not thinking about hardware limitations

[–] naught@sh.itjust.works 8 points 3 months ago (1 children)

Pandas supports lazy loading and can read files in chunks. Hell, even regular ole Python doesn't need to read the whole file at once with csv

[–] tequinhu@lemmy.world 3 points 3 months ago* (last edited 3 months ago)

I didn't know about lazy loading, that's cool!

Then I guess that the meme doesn't apply anymore. Though I will state that (from my anedoctal experience) people that can use Panda's most advanced features* are also comfortable with other data processing frameworks (usually more suitable to large datasets**)

*Anything beyond the standard groupby - apply can be considered advanced, from the placrs I've been

**I feel the urge to note that 60Mb isn' lt a large dataset by any means, but I believe that's beyond the point