Welcome!
Important information
-- Information on torrent hosting changes
-- Information on MX15/16 GPG Keys
-- Spectre and Meltdown vulnerabilities

News
-- MX Linux on social media: here
-- Mepis support still here

Current releases
-- MX-18.1 Point Release release info here
-- Migration Information to MX-18 here
-- antiX-17.3.1 release info here

New users[/u
-- Please read this first, and don't forget to add system and hardware information to posts!
-- Here are the Forum Rules

Manipulating large data sets

This forum is intended for the discussion around the use of MEPIS in an office setting as distinct from home or indivicual usage. Examples include mail merge, database construction, hardware sharing, advertising, etc..
Post Reply
User avatar
Jerry3904
Forum Veteran
Forum Veteran
Posts: 23783
Joined: Wed Jul 19, 2006 6:13 am

Manipulating large data sets

#1

Post by Jerry3904 » Mon Mar 07, 2016 10:33 am

I am currently engaged in a research project that involves merging and analyzing large data sets (not "big data" sets, which is an entirely different story). While setting up my methodology, I ran across this excellent (and admittedly geeky) document and thought I would share it in case anyone else comes along with the same needs.

http://moo.nac.uci.edu/~hjm/Manipulatin ... Linux.html

It may be that LO Calc will be sufficient, but I am not sure yet and want also to be careful to accommodate future statistical needs (probably R).
Production: 4.15.0-1-amd64, MX-17.1, AMD FX-4130 Quad-Core, GeForce GT 630/PCIe/SSE2, 8 GB, SSD 120 GB, Data 1TB
Testing: AAO 722: 4.15.0-1-386. MX-17.1, AMD C-60 APU, 4 GB
Personal: XPS 13, 4.18.0-19.3-liquorix, 4 GB

User avatar
Richard
Posts: 2479
Joined: Fri Dec 12, 2008 10:31 am

Re: Manipulating large data sets

#2

Post by Richard » Mon Mar 07, 2016 11:34 am

https://wiki.documentfoundation.org/Faq/Calc/022

What is the maximum number of cells, rows, columns and sheets in a LibreOffice spreadsheet?
The maximum number of columns is 1,024 (from column A to column AMJ);
the maximum number of rows is 1,048,576 (220);
the maximum number of cells in one sheet is 1,073,741,824 (230 which is more than 1 billion cells);
the maximum number of individual sheets in a complete worksheet is 256.


How large is large?
Laptop: MX18.1: Thinkpad T430: Dual Core, Intel i5-3320M, Ivy Bridge, 8GB RAM, 4.19.0-1-amd64, 119GB SSD 840PRO, Intel Graphics-Audio-Network
Netbook: MX18.1: AsusTek EeePC 1005HA: Intel Dual Core Atom N270, 1GB RAM, 4.19.0-1-686, 150GB HDD

User avatar
Jerry3904
Forum Veteran
Forum Veteran
Posts: 23783
Joined: Wed Jul 19, 2006 6:13 am

Re: Manipulating large data sets

#3

Post by Jerry3904 » Mon Mar 07, 2016 12:09 pm

Thanks. It's not just a question of capacity (which I don't know yet). As my title indicates, I am concerned with manipulation and data analysis capabilities. Gnumeric used to be considered better in this area, but I am not sure that is still the case.

One of the alternatives I am considering is SQLite, in conjunction with SQLiteBrowser. I will know more in another few weeks about what the data actually look like.
Production: 4.15.0-1-amd64, MX-17.1, AMD FX-4130 Quad-Core, GeForce GT 630/PCIe/SSE2, 8 GB, SSD 120 GB, Data 1TB
Testing: AAO 722: 4.15.0-1-386. MX-17.1, AMD C-60 APU, 4 GB
Personal: XPS 13, 4.18.0-19.3-liquorix, 4 GB

User avatar
dolphin_oracle
Forum Veteran
Forum Veteran
Posts: 10662
Joined: Sun Dec 16, 2007 1:17 pm

Re: Manipulating large data sets

#4

Post by dolphin_oracle » Mon Mar 07, 2016 12:30 pm

Jerry3904 wrote:Thanks. It's not just a question of capacity (which I don't know yet). As my title indicates, I am concerned with manipulation and data analysis capabilities. Gnumeric used to be considered better in this area, but I am not sure that is still the case.

One of the alternatives I am considering is SQLite, in conjunction with SQLiteBrowser. I will know more in another few weeks about what the data actually look like.
be carefull if you use libreoffice base to interface with SQlite. My dad ( with a gazillion years experience as a senior database maintainer and analyst for a couple of different large chemical and materials companies) borked 2 separate installations of a weight lifting tacking database app he wrote using libreoffice, and the crashes corrupted his data. luckily he had a backup.

of course, he is also used to working in oracledb, so it might have been user-driven problems.
http://www.youtube.com/runwiththedolphin
lenovo ThinkPad T530 - MX-18
lenovo s21e - MX-18, antiX-17.3.1 (live-USB)
FYI: mx "test" repo is not the same thing as debian testing repo.

User avatar
Jerry3904
Forum Veteran
Forum Veteran
Posts: 23783
Joined: Wed Jul 19, 2006 6:13 am

Re: Manipulating large data sets

#5

Post by Jerry3904 » Mon Mar 07, 2016 12:32 pm

Thanks for that warning!

My *plan* atm is to set up a SQL database from the beginning, and use the SQLbrowser to manipulate and analyze the data.
Production: 4.15.0-1-amd64, MX-17.1, AMD FX-4130 Quad-Core, GeForce GT 630/PCIe/SSE2, 8 GB, SSD 120 GB, Data 1TB
Testing: AAO 722: 4.15.0-1-386. MX-17.1, AMD C-60 APU, 4 GB
Personal: XPS 13, 4.18.0-19.3-liquorix, 4 GB

User avatar
Old Giza
Forum Regular
Forum Regular
Posts: 394
Joined: Wed Apr 16, 2014 10:31 pm

Re: Manipulating large data sets

#6

Post by Old Giza » Mon Mar 07, 2016 7:04 pm

Probably irrelevant but in case it's of interest ...

I used to support the processing of thousands of large data sets (surveys/diaries) of several gigabytes or more using a little-known product called TPL, originally created by the US Bureau of Labor Statistics, now supported by ex-employees in QQQSoft. Used both the IBM mainframe version and later Linux.

The advantages were: no special expertise required (just the ability to read a comprehensive manual), flexible input data description language, non-procedural results type language, publication-ready (PDF) or Web-ready (HTML) output formatting, super-fast results, unlimited input data sets. The restriction was that the analysis was limited to what could be done using sequential input, i.e. tabulations, percentages, etc. But it was surprising what requests could be accomplished doing it that way. Had considered using SQL databases tied into another analysis package, but that would have slowed everything to a crawl. (Disclaimer: I have no ties to the product.)

User avatar
Gordon Cooper
Forum Guide
Forum Guide
Posts: 2276
Joined: Mon Nov 21, 2011 5:50 pm

Re: Manipulating large data sets

#7

Post by Gordon Cooper » Mon Mar 07, 2016 7:25 pm

Was thinking that MariaDB could be another option, it is GPL but have not looked at it for years.
Primary: Dell9010, MX-18, Win7, 120 SSD, WD 232GIB HD, 4GB RAM
Backup :Homebrew64 bit Intel duo core 2 GB RAM, 120 GB Kingston SSD, Seagate1TB.
MX-17.1 64bit. Also MX17, Kubuntu14.04 & Puppy 6.3.

User avatar
Jerry3904
Forum Veteran
Forum Veteran
Posts: 23783
Joined: Wed Jul 19, 2006 6:13 am

Re: Manipulating large data sets

#8

Post by Jerry3904 » Mon Mar 07, 2016 7:50 pm

10.0 is in the repos, so I will take a look.
Production: 4.15.0-1-amd64, MX-17.1, AMD FX-4130 Quad-Core, GeForce GT 630/PCIe/SSE2, 8 GB, SSD 120 GB, Data 1TB
Testing: AAO 722: 4.15.0-1-386. MX-17.1, AMD C-60 APU, 4 GB
Personal: XPS 13, 4.18.0-19.3-liquorix, 4 GB

User avatar
Gordon Cooper
Forum Guide
Forum Guide
Posts: 2276
Joined: Mon Nov 21, 2011 5:50 pm

Re: Manipulating large data sets

#9

Post by Gordon Cooper » Mon Mar 07, 2016 11:39 pm

10.1 was released late last year but have never used it. Also DBEdit might provide the manipulation you are looking for.
Primary: Dell9010, MX-18, Win7, 120 SSD, WD 232GIB HD, 4GB RAM
Backup :Homebrew64 bit Intel duo core 2 GB RAM, 120 GB Kingston SSD, Seagate1TB.
MX-17.1 64bit. Also MX17, Kubuntu14.04 & Puppy 6.3.

Post Reply

Return to “Office”